We’ve looked at a few different ways in which we can build models this week, including how to prepare them properly. This weekend we’ll build a multiple linear regression model on a dataset which will need some preparation. The data can be found in the data folder, along with a data dictionary.
We want to investigate the avocado dataset, and, in particular, to model the AveragePrice of the avocados. Use the tools we’ve worked with this week in order to prepare your dataset and find appropriate predictors. Once you’ve built your model use the validation techniques discussed on Wednesday to evaluate it. Feel free to focus either on building an explanatory or a predictive model, or both if you are feeling energetic!
As part of the MVP we want you not to just run the code but also have a go at intepreting the results and write your thinking in comments in your script.
Hints and tips
region may lead to many dummy variables. Think carefully about whether to include this variable or not (there is no one ‘right’ answer to this!)Date will not be needed in your models, but can you extract any useful features out of Date before you discard it?leaps or glmulti to help with this.Load libraries:
library(tidyverse)
library(GGally)
library(modelr)
library(janitor)
Load dataset and examine it:
avocados <- clean_names(read_csv("data/avocado.csv"))
head(avocados)
Ok, we have 14 variables. Can already see that some of them are somewhat useless (x1 for example). Not sure whether the total_bags variable is the sum of small_bags, large_bags and x_large_bags so I’ll check that first.
# check to see if total_bags variable is just the sum of the other three
avocados %>%
mutate(total_sum = small_bags + large_bags + x_large_bags) %>%
select(total_bags, total_sum)
Yep, the total_bags column is just a sum of the other three. So this is a another variable I can get rid of. I can also check the same for volume:
# check to see if total_volume variable is just the sum of the other three
avocados %>%
mutate(total_sum = x4046 + x4225 + x4770) %>%
select(total_volume, total_sum)
Nope, these aren’t the same, so we can keep all these in.
Now let’s check how many different levels of each categorical variable we have.
avocados %>%
distinct(region) %>%
summarise(number_of_regions = n())
avocados %>%
distinct(date) %>%
summarise(
number_of_dates = n(),
min_date = min(date),
max_date = max(date)
)
The region variable will lead to many categorical levels, but we can try leaving it in. We should also examine date and perhaps pull out from it whatever features we can. Including every single date would be too much, so we can extract the different parts of the date that might be useful. For example, we could try and split it into different quarters, or years.
So, let’s do this now. Remove the variables we don’t need, change our categorical variables to factors, and extract parts of the date in case they are useful (and get rid of date).
library(lubridate)
trimmed_avocados <- avocados %>%
mutate(
quarter = as_factor(quarter(date)),
year = as_factor(year),
type = as_factor(type),
region = as_factor(region)
) %>%
select(-c(x1, date,total_bags))
Now we’ve done our cleaning, we can check for aliased variables (i.e. combinations of variables in which one or more of the variables can be calculated exactly from other variables):
alias(average_price ~ ., data = trimmed_avocados )
## Model :
## average_price ~ total_volume + x4046 + x4225 + x4770 + small_bags +
## large_bags + x_large_bags + type + year + region + quarter
Nice, we don’t find any aliases. So we can keep going.
We need to decide on which variable we want to put in our model first. To do this, we should visualise it. Because we have so much data, ggpairs() might take a while to run, so we can split it up a bit.
# let's start by plotting the volume variables
trimmed_avocados %>%
select(average_price, total_volume, x4046, x4225, x4770) %>%
ggpairs() +
theme_grey(base_size = 8) # font size of labels
Hmm, these look highly correlated with one another in some instances. This is a sign that we won’t have to include all of these in our model, so we could think about removing x4225 and x4770 from our dataset to give ourselves fewer variables.
trimmed_avocados <- trimmed_avocados %>%
select(-x4225, -x4770)
In terms of variables that correlate well with average_price… well none of them do, that well. But that’s life. Our x046 variable is probably our first candidate.
Next we can look at our volume variables.
trimmed_avocados %>%
select(average_price, small_bags, large_bags, x_large_bags) %>%
ggpairs() +
theme_grey(base_size = 8) # font size of labels
Hmm, again… not that promising. Some of the variables are highly correlated with one another, but not much seems highly correlated with average_price.
We can look at some of our categorical variables next:
trimmed_avocados %>%
select(average_price, type, year, quarter) %>%
ggpairs() +
theme_grey(base_size = 8) # font size of labels
This seems better! Our type variable seems to show variation in the boxplots. This might suggest that conventional avocados and organic ones have different prices (which again, makes sense).
Finally, we can make a boxplot of our region variable. Because this has so many levels, it makes sense to plot it by itself so we can see it.
trimmed_avocados %>%
ggplot(aes(x = region, y = average_price)) +
geom_boxplot() +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))
Ok, seems there is some variation in the boxplots between different regions, so that seems like it could be promising.
Let’s start by test competing models. We decided that x4046, type, and region seemed reasonable:
library(ggfortify)
# build the model
model1a <- lm(average_price ~ x4046, data = trimmed_avocados)
# check the diagnostics
autoplot(model1a)
# check the summary output
summary(model1a)
##
## Call:
## lm(formula = average_price ~ x4046, data = trimmed_avocados)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.98539 -0.29842 -0.03531 0.25459 1.82475
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.425e+00 2.993e-03 476.29 <2e-16 ***
## x4046 -6.631e-08 2.305e-09 -28.77 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3939 on 18247 degrees of freedom
## Multiple R-squared: 0.0434, Adjusted R-squared: 0.04334
## F-statistic: 827.8 on 1 and 18247 DF, p-value: < 2.2e-16
# build the model
model1b <- lm(average_price ~ type, data = trimmed_avocados)
# check the diagnostics
autoplot(model1b)
# check the summary output
summary(model1b)
##
## Call:
## lm(formula = average_price ~ type, data = trimmed_avocados)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.21400 -0.20400 -0.02804 0.18600 1.59600
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.158040 0.003321 348.7 <2e-16 ***
## typeorganic 0.495959 0.004697 105.6 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3173 on 18247 degrees of freedom
## Multiple R-squared: 0.3793, Adjusted R-squared: 0.3792
## F-statistic: 1.115e+04 on 1 and 18247 DF, p-value: < 2.2e-16
# build the model
model1c <- lm(average_price ~ region, data = trimmed_avocados)
# check the diagnostics
autoplot(model1c)
# check the summary output
summary(model1c)
##
## Call:
## lm(formula = average_price ~ region, data = trimmed_avocados)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.97095 -0.28423 -0.03432 0.25207 1.76115
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.561036 0.020006 78.029 < 2e-16 ***
## regionAtlanta -0.223077 0.028293 -7.885 3.33e-15 ***
## regionBaltimoreWashington -0.026805 0.028293 -0.947 0.34344
## regionBoise -0.212899 0.028293 -7.525 5.52e-14 ***
## regionBoston -0.030148 0.028293 -1.066 0.28663
## regionBuffaloRochester -0.044201 0.028293 -1.562 0.11824
## regionCalifornia -0.165710 0.028293 -5.857 4.79e-09 ***
## regionCharlotte 0.045000 0.028293 1.591 0.11173
## regionChicago -0.004260 0.028293 -0.151 0.88031
## regionCincinnatiDayton -0.351834 0.028293 -12.436 < 2e-16 ***
## regionColumbus -0.308254 0.028293 -10.895 < 2e-16 ***
## regionDallasFtWorth -0.475444 0.028293 -16.805 < 2e-16 ***
## regionDenver -0.342456 0.028293 -12.104 < 2e-16 ***
## regionDetroit -0.284941 0.028293 -10.071 < 2e-16 ***
## regionGrandRapids -0.056036 0.028293 -1.981 0.04765 *
## regionGreatLakes -0.222485 0.028293 -7.864 3.94e-15 ***
## regionHarrisburgScranton -0.047751 0.028293 -1.688 0.09147 .
## regionHartfordSpringfield 0.257604 0.028293 9.105 < 2e-16 ***
## regionHouston -0.513107 0.028293 -18.136 < 2e-16 ***
## regionIndianapolis -0.247041 0.028293 -8.732 < 2e-16 ***
## regionJacksonville -0.050089 0.028293 -1.770 0.07668 .
## regionLasVegas -0.180118 0.028293 -6.366 1.98e-10 ***
## regionLosAngeles -0.345030 0.028293 -12.195 < 2e-16 ***
## regionLouisville -0.274349 0.028293 -9.697 < 2e-16 ***
## regionMiamiFtLauderdale -0.132544 0.028293 -4.685 2.82e-06 ***
## regionMidsouth -0.156272 0.028293 -5.523 3.37e-08 ***
## regionNashville -0.348935 0.028293 -12.333 < 2e-16 ***
## regionNewOrleansMobile -0.256243 0.028293 -9.057 < 2e-16 ***
## regionNewYork 0.166538 0.028293 5.886 4.02e-09 ***
## regionNortheast 0.040888 0.028293 1.445 0.14843
## regionNorthernNewEngland -0.083639 0.028293 -2.956 0.00312 **
## regionOrlando -0.054822 0.028293 -1.938 0.05268 .
## regionPhiladelphia 0.071095 0.028293 2.513 0.01199 *
## regionPhoenixTucson -0.336598 0.028293 -11.897 < 2e-16 ***
## regionPittsburgh -0.196716 0.028293 -6.953 3.70e-12 ***
## regionPlains -0.124527 0.028293 -4.401 1.08e-05 ***
## regionPortland -0.243314 0.028293 -8.600 < 2e-16 ***
## regionRaleighGreensboro -0.005917 0.028293 -0.209 0.83434
## regionRichmondNorfolk -0.269704 0.028293 -9.533 < 2e-16 ***
## regionRoanoke -0.313107 0.028293 -11.067 < 2e-16 ***
## regionSacramento 0.060533 0.028293 2.140 0.03241 *
## regionSanDiego -0.162870 0.028293 -5.757 8.72e-09 ***
## regionSanFrancisco 0.243166 0.028293 8.595 < 2e-16 ***
## regionSeattle -0.118462 0.028293 -4.187 2.84e-05 ***
## regionSouthCarolina -0.157751 0.028293 -5.576 2.50e-08 ***
## regionSouthCentral -0.459793 0.028293 -16.251 < 2e-16 ***
## regionSoutheast -0.163018 0.028293 -5.762 8.45e-09 ***
## regionSpokane -0.115444 0.028293 -4.080 4.52e-05 ***
## regionStLouis -0.130414 0.028293 -4.609 4.06e-06 ***
## regionSyracuse -0.040710 0.028293 -1.439 0.15020
## regionTampa -0.152189 0.028293 -5.379 7.58e-08 ***
## regionTotalUS -0.242012 0.028293 -8.554 < 2e-16 ***
## regionWest -0.288817 0.028293 -10.208 < 2e-16 ***
## regionWestTexNewMexico -0.299334 0.028356 -10.556 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3678 on 18195 degrees of freedom
## Multiple R-squared: 0.1681, Adjusted R-squared: 0.1657
## F-statistic: 69.38 on 53 and 18195 DF, p-value: < 2.2e-16
model1b with type is best, so we’ll keep that and re-run ggpairs() with the residuals (again omitting region because it’s too big).
avocados_remaining_resid <- trimmed_avocados %>%
add_residuals(model1b) %>%
select(-c("average_price", "type", "region"))
ggpairs(avocados_remaining_resid) +
theme_grey(base_size = 8) # this bit just changes the axis label font size so we can see
Again, this isn’t showing any really high correlations between the residuals and any of our numeric variables. Looks like x4046, year, quarter could show something potentially (given the rubbish variables we have).
trimmed_avocados %>%
add_residuals(model1b) %>%
ggplot(aes(x = region, y = resid)) +
geom_boxplot() +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))
Looks like region are our next contenders to try. Let’s do these now.
model2a <- lm(average_price ~ type + x4046, data = trimmed_avocados)
autoplot(model2a)
summary(model2a)
##
## Call:
## lm(formula = average_price ~ type + x4046, data = trimmed_avocados)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.21416 -0.20029 -0.02736 0.18591 1.59589
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.171e+00 3.485e-03 336.13 <2e-16 ***
## typeorganic 4.827e-01 4.802e-03 100.52 <2e-16 ***
## x4046 -2.323e-08 1.898e-09 -12.24 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.316 on 18246 degrees of freedom
## Multiple R-squared: 0.3843, Adjusted R-squared: 0.3843
## F-statistic: 5695 on 2 and 18246 DF, p-value: < 2.2e-16
model2b <- lm(average_price ~ type + year, data = trimmed_avocados)
autoplot(model2b)
summary(model2b)
##
## Call:
## lm(formula = average_price ~ type + year, data = trimmed_avocados)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.32320 -0.18722 -0.01722 0.18278 1.66337
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.127645 0.004704 239.735 < 2e-16 ***
## typeorganic 0.495980 0.004563 108.685 < 2e-16 ***
## year2016 -0.036995 0.005817 -6.360 2.07e-10 ***
## year2017 0.139580 0.005790 24.107 < 2e-16 ***
## year2018 -0.028104 0.009499 -2.959 0.00309 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3082 on 18244 degrees of freedom
## Multiple R-squared: 0.4142, Adjusted R-squared: 0.4141
## F-statistic: 3225 on 4 and 18244 DF, p-value: < 2.2e-16
model2c <- lm(average_price ~ type + quarter, data = trimmed_avocados)
autoplot(model2c)
summary(model2c)
##
## Call:
## lm(formula = average_price ~ type + quarter, data = trimmed_avocados)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.11458 -0.20089 -0.02458 0.18542 1.54687
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.058626 0.004718 224.38 <2e-16 ***
## typeorganic 0.495958 0.004543 109.16 <2e-16 ***
## quarter2 0.068546 0.006282 10.91 <2e-16 ***
## quarter3 0.206308 0.006281 32.84 <2e-16 ***
## quarter4 0.152040 0.006237 24.38 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3069 on 18244 degrees of freedom
## Multiple R-squared: 0.4193, Adjusted R-squared: 0.4192
## F-statistic: 3294 on 4 and 18244 DF, p-value: < 2.2e-16
model2d <- lm(average_price ~ type + region, data = trimmed_avocados)
autoplot(model2d)
summary(model2d)
##
## Call:
## lm(formula = average_price ~ type + region, data = trimmed_avocados)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.09858 -0.16716 -0.01814 0.14692 1.51320
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.313079 0.014894 88.159 < 2e-16 ***
## typeorganic 0.495912 0.004017 123.452 < 2e-16 ***
## regionAtlanta -0.223077 0.020871 -10.688 < 2e-16 ***
## regionBaltimoreWashington -0.026805 0.020871 -1.284 0.19906
## regionBoise -0.212899 0.020871 -10.201 < 2e-16 ***
## regionBoston -0.030148 0.020871 -1.444 0.14863
## regionBuffaloRochester -0.044201 0.020871 -2.118 0.03421 *
## regionCalifornia -0.165710 0.020871 -7.940 2.15e-15 ***
## regionCharlotte 0.045000 0.020871 2.156 0.03109 *
## regionChicago -0.004260 0.020871 -0.204 0.83826
## regionCincinnatiDayton -0.351834 0.020871 -16.857 < 2e-16 ***
## regionColumbus -0.308254 0.020871 -14.769 < 2e-16 ***
## regionDallasFtWorth -0.475444 0.020871 -22.780 < 2e-16 ***
## regionDenver -0.342456 0.020871 -16.408 < 2e-16 ***
## regionDetroit -0.284941 0.020871 -13.652 < 2e-16 ***
## regionGrandRapids -0.056036 0.020871 -2.685 0.00726 **
## regionGreatLakes -0.222485 0.020871 -10.660 < 2e-16 ***
## regionHarrisburgScranton -0.047751 0.020871 -2.288 0.02216 *
## regionHartfordSpringfield 0.257604 0.020871 12.342 < 2e-16 ***
## regionHouston -0.513107 0.020871 -24.584 < 2e-16 ***
## regionIndianapolis -0.247041 0.020871 -11.836 < 2e-16 ***
## regionJacksonville -0.050089 0.020871 -2.400 0.01641 *
## regionLasVegas -0.180118 0.020871 -8.630 < 2e-16 ***
## regionLosAngeles -0.345030 0.020871 -16.531 < 2e-16 ***
## regionLouisville -0.274349 0.020871 -13.145 < 2e-16 ***
## regionMiamiFtLauderdale -0.132544 0.020871 -6.351 2.20e-10 ***
## regionMidsouth -0.156272 0.020871 -7.487 7.35e-14 ***
## regionNashville -0.348935 0.020871 -16.718 < 2e-16 ***
## regionNewOrleansMobile -0.256243 0.020871 -12.277 < 2e-16 ***
## regionNewYork 0.166538 0.020871 7.979 1.56e-15 ***
## regionNortheast 0.040888 0.020871 1.959 0.05013 .
## regionNorthernNewEngland -0.083639 0.020871 -4.007 6.16e-05 ***
## regionOrlando -0.054822 0.020871 -2.627 0.00863 **
## regionPhiladelphia 0.071095 0.020871 3.406 0.00066 ***
## regionPhoenixTucson -0.336598 0.020871 -16.127 < 2e-16 ***
## regionPittsburgh -0.196716 0.020871 -9.425 < 2e-16 ***
## regionPlains -0.124527 0.020871 -5.966 2.47e-09 ***
## regionPortland -0.243314 0.020871 -11.658 < 2e-16 ***
## regionRaleighGreensboro -0.005917 0.020871 -0.284 0.77679
## regionRichmondNorfolk -0.269704 0.020871 -12.922 < 2e-16 ***
## regionRoanoke -0.313107 0.020871 -15.002 < 2e-16 ***
## regionSacramento 0.060533 0.020871 2.900 0.00373 **
## regionSanDiego -0.162870 0.020871 -7.803 6.35e-15 ***
## regionSanFrancisco 0.243166 0.020871 11.651 < 2e-16 ***
## regionSeattle -0.118462 0.020871 -5.676 1.40e-08 ***
## regionSouthCarolina -0.157751 0.020871 -7.558 4.28e-14 ***
## regionSouthCentral -0.459793 0.020871 -22.030 < 2e-16 ***
## regionSoutheast -0.163018 0.020871 -7.811 6.00e-15 ***
## regionSpokane -0.115444 0.020871 -5.531 3.22e-08 ***
## regionStLouis -0.130414 0.020871 -6.248 4.24e-10 ***
## regionSyracuse -0.040710 0.020871 -1.951 0.05113 .
## regionTampa -0.152189 0.020871 -7.292 3.18e-13 ***
## regionTotalUS -0.242012 0.020871 -11.595 < 2e-16 ***
## regionWest -0.288817 0.020871 -13.838 < 2e-16 ***
## regionWestTexNewMexico -0.297114 0.020918 -14.204 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2713 on 18194 degrees of freedom
## Multiple R-squared: 0.5473, Adjusted R-squared: 0.546
## F-statistic: 407.4 on 54 and 18194 DF, p-value: < 2.2e-16
So model2d with type and region comes out as better here. We have some region coefficients that are not significant at \(0.05\) level, so let’s run an anova() to test whether to include region
# model1b is the model with average_price ~ type
# model2d is the model with average_price ~ type + region
# we want to compare the two
anova(model1b, model2d)
It seems region is significant overall, so we’ll keep it in!
Model2d is our model with average_price ~ type + region, and it explains 0.5473 of the variance in average price. This isn’t really very high, so we can think about adding a third predictor now. Again, we want to remove these variables from our data, and check the residuals.
avocados_remaining_resid <- trimmed_avocados %>%
add_residuals(model2d) %>%
select(-c("average_price", "type", "region"))
ggpairs(avocados_remaining_resid) +
theme_grey(base_size = 8) # font size of labels
The next contender variables look to be x_large_bags, year and quarter. Let’s try them out.
model3a <- lm(average_price ~ type + region + x_large_bags, data = trimmed_avocados)
autoplot(model3a)
summary(model3a)
##
## Call:
## lm(formula = average_price ~ type + region + x_large_bags, data = trimmed_avocados)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.10024 -0.16726 -0.01734 0.14591 1.51156
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.311e+00 1.489e-02 88.033 < 2e-16 ***
## typeorganic 5.001e-01 4.101e-03 121.953 < 2e-16 ***
## regionAtlanta -2.235e-01 2.086e-02 -10.718 < 2e-16 ***
## regionBaltimoreWashington -2.713e-02 2.086e-02 -1.301 0.193298
## regionBoise -2.128e-01 2.086e-02 -10.204 < 2e-16 ***
## regionBoston -3.023e-02 2.086e-02 -1.449 0.147234
## regionBuffaloRochester -4.428e-02 2.086e-02 -2.123 0.033774 *
## regionCalifornia -1.762e-01 2.096e-02 -8.408 < 2e-16 ***
## regionCharlotte 4.495e-02 2.086e-02 2.155 0.031177 *
## regionChicago -4.936e-03 2.086e-02 -0.237 0.812924
## regionCincinnatiDayton -3.523e-01 2.086e-02 -16.890 < 2e-16 ***
## regionColumbus -3.086e-01 2.086e-02 -14.796 < 2e-16 ***
## regionDallasFtWorth -4.762e-01 2.086e-02 -22.832 < 2e-16 ***
## regionDenver -3.425e-01 2.086e-02 -16.420 < 2e-16 ***
## regionDetroit -2.882e-01 2.087e-02 -13.810 < 2e-16 ***
## regionGrandRapids -5.764e-02 2.086e-02 -2.763 0.005731 **
## regionGreatLakes -2.353e-01 2.101e-02 -11.198 < 2e-16 ***
## regionHarrisburgScranton -4.798e-02 2.086e-02 -2.300 0.021451 *
## regionHartfordSpringfield 2.575e-01 2.086e-02 12.347 < 2e-16 ***
## regionHouston -5.137e-01 2.086e-02 -24.628 < 2e-16 ***
## regionIndianapolis -2.475e-01 2.086e-02 -11.867 < 2e-16 ***
## regionJacksonville -5.021e-02 2.086e-02 -2.407 0.016074 *
## regionLasVegas -1.801e-01 2.086e-02 -8.633 < 2e-16 ***
## regionLosAngeles -3.532e-01 2.092e-02 -16.881 < 2e-16 ***
## regionLouisville -2.745e-01 2.086e-02 -13.160 < 2e-16 ***
## regionMiamiFtLauderdale -1.331e-01 2.086e-02 -6.380 1.81e-10 ***
## regionMidsouth -1.590e-01 2.086e-02 -7.619 2.68e-14 ***
## regionNashville -3.491e-01 2.086e-02 -16.736 < 2e-16 ***
## regionNewOrleansMobile -2.572e-01 2.086e-02 -12.330 < 2e-16 ***
## regionNewYork 1.659e-01 2.086e-02 7.954 1.91e-15 ***
## regionNortheast 3.834e-02 2.086e-02 1.838 0.066151 .
## regionNorthernNewEngland -8.377e-02 2.086e-02 -4.017 5.93e-05 ***
## regionOrlando -5.523e-02 2.086e-02 -2.648 0.008111 **
## regionPhiladelphia 7.097e-02 2.086e-02 3.403 0.000669 ***
## regionPhoenixTucson -3.368e-01 2.086e-02 -16.149 < 2e-16 ***
## regionPittsburgh -1.967e-01 2.086e-02 -9.433 < 2e-16 ***
## regionPlains -1.267e-01 2.086e-02 -6.072 1.29e-09 ***
## regionPortland -2.434e-01 2.086e-02 -11.669 < 2e-16 ***
## regionRaleighGreensboro -6.021e-03 2.086e-02 -0.289 0.772828
## regionRichmondNorfolk -2.699e-01 2.086e-02 -12.939 < 2e-16 ***
## regionRoanoke -3.132e-01 2.086e-02 -15.015 < 2e-16 ***
## regionSacramento 6.020e-02 2.086e-02 2.886 0.003904 **
## regionSanDiego -1.631e-01 2.086e-02 -7.819 5.64e-15 ***
## regionSanFrancisco 2.428e-01 2.086e-02 11.642 < 2e-16 ***
## regionSeattle -1.185e-01 2.086e-02 -5.682 1.35e-08 ***
## regionSouthCarolina -1.581e-01 2.086e-02 -7.581 3.59e-14 ***
## regionSouthCentral -4.650e-01 2.088e-02 -22.268 < 2e-16 ***
## regionSoutheast -1.680e-01 2.088e-02 -8.046 9.10e-16 ***
## regionSpokane -1.154e-01 2.086e-02 -5.531 3.22e-08 ***
## regionStLouis -1.308e-01 2.086e-02 -6.270 3.69e-10 ***
## regionSyracuse -4.071e-02 2.086e-02 -1.952 0.050993 .
## regionTampa -1.526e-01 2.086e-02 -7.315 2.68e-13 ***
## regionTotalUS -2.852e-01 2.255e-02 -12.648 < 2e-16 ***
## regionWest -2.904e-01 2.086e-02 -13.922 < 2e-16 ***
## regionWestTexNewMexico -2.976e-01 2.090e-02 -14.238 < 2e-16 ***
## x_large_bags 6.810e-07 1.351e-07 5.040 4.70e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2711 on 18193 degrees of freedom
## Multiple R-squared: 0.548, Adjusted R-squared: 0.5466
## F-statistic: 401 on 55 and 18193 DF, p-value: < 2.2e-16
model3b <- lm(average_price ~ type + region + year, data = trimmed_avocados)
autoplot(model3b)
summary(model3b)
##
## Call:
## lm(formula = average_price ~ type + region + year, data = trimmed_avocados)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.1532 -0.1497 -0.0060 0.1419 1.4849
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.282672 0.014600 87.857 < 2e-16 ***
## typeorganic 0.495933 0.003859 128.501 < 2e-16 ***
## regionAtlanta -0.223077 0.020052 -11.125 < 2e-16 ***
## regionBaltimoreWashington -0.026805 0.020052 -1.337 0.181322
## regionBoise -0.212899 0.020052 -10.617 < 2e-16 ***
## regionBoston -0.030148 0.020052 -1.503 0.132735
## regionBuffaloRochester -0.044201 0.020052 -2.204 0.027515 *
## regionCalifornia -0.165710 0.020052 -8.264 < 2e-16 ***
## regionCharlotte 0.045000 0.020052 2.244 0.024835 *
## regionChicago -0.004260 0.020052 -0.212 0.831748
## regionCincinnatiDayton -0.351834 0.020052 -17.546 < 2e-16 ***
## regionColumbus -0.308254 0.020052 -15.373 < 2e-16 ***
## regionDallasFtWorth -0.475444 0.020052 -23.710 < 2e-16 ***
## regionDenver -0.342456 0.020052 -17.078 < 2e-16 ***
## regionDetroit -0.284941 0.020052 -14.210 < 2e-16 ***
## regionGrandRapids -0.056036 0.020052 -2.794 0.005204 **
## regionGreatLakes -0.222485 0.020052 -11.095 < 2e-16 ***
## regionHarrisburgScranton -0.047751 0.020052 -2.381 0.017259 *
## regionHartfordSpringfield 0.257604 0.020052 12.847 < 2e-16 ***
## regionHouston -0.513107 0.020052 -25.589 < 2e-16 ***
## regionIndianapolis -0.247041 0.020052 -12.320 < 2e-16 ***
## regionJacksonville -0.050089 0.020052 -2.498 0.012501 *
## regionLasVegas -0.180118 0.020052 -8.982 < 2e-16 ***
## regionLosAngeles -0.345030 0.020052 -17.207 < 2e-16 ***
## regionLouisville -0.274349 0.020052 -13.682 < 2e-16 ***
## regionMiamiFtLauderdale -0.132544 0.020052 -6.610 3.95e-11 ***
## regionMidsouth -0.156272 0.020052 -7.793 6.88e-15 ***
## regionNashville -0.348935 0.020052 -17.401 < 2e-16 ***
## regionNewOrleansMobile -0.256243 0.020052 -12.779 < 2e-16 ***
## regionNewYork 0.166538 0.020052 8.305 < 2e-16 ***
## regionNortheast 0.040888 0.020052 2.039 0.041459 *
## regionNorthernNewEngland -0.083639 0.020052 -4.171 3.05e-05 ***
## regionOrlando -0.054822 0.020052 -2.734 0.006263 **
## regionPhiladelphia 0.071095 0.020052 3.545 0.000393 ***
## regionPhoenixTucson -0.336598 0.020052 -16.786 < 2e-16 ***
## regionPittsburgh -0.196716 0.020052 -9.810 < 2e-16 ***
## regionPlains -0.124527 0.020052 -6.210 5.41e-10 ***
## regionPortland -0.243314 0.020052 -12.134 < 2e-16 ***
## regionRaleighGreensboro -0.005917 0.020052 -0.295 0.767930
## regionRichmondNorfolk -0.269704 0.020052 -13.450 < 2e-16 ***
## regionRoanoke -0.313107 0.020052 -15.615 < 2e-16 ***
## regionSacramento 0.060533 0.020052 3.019 0.002542 **
## regionSanDiego -0.162870 0.020052 -8.122 4.86e-16 ***
## regionSanFrancisco 0.243166 0.020052 12.127 < 2e-16 ***
## regionSeattle -0.118462 0.020052 -5.908 3.53e-09 ***
## regionSouthCarolina -0.157751 0.020052 -7.867 3.83e-15 ***
## regionSouthCentral -0.459793 0.020052 -22.930 < 2e-16 ***
## regionSoutheast -0.163018 0.020052 -8.130 4.58e-16 ***
## regionSpokane -0.115444 0.020052 -5.757 8.69e-09 ***
## regionStLouis -0.130414 0.020052 -6.504 8.04e-11 ***
## regionSyracuse -0.040710 0.020052 -2.030 0.042350 *
## regionTampa -0.152189 0.020052 -7.590 3.36e-14 ***
## regionTotalUS -0.242012 0.020052 -12.069 < 2e-16 ***
## regionWest -0.288817 0.020052 -14.403 < 2e-16 ***
## regionWestTexNewMexico -0.296552 0.020097 -14.756 < 2e-16 ***
## year2016 -0.036970 0.004920 -7.515 5.96e-14 ***
## year2017 0.139555 0.004897 28.500 < 2e-16 ***
## year2018 -0.028078 0.008033 -3.495 0.000475 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2607 on 18191 degrees of freedom
## Multiple R-squared: 0.5822, Adjusted R-squared: 0.5809
## F-statistic: 444.8 on 57 and 18191 DF, p-value: < 2.2e-16
model3c <- lm(average_price ~ type + region + quarter, data = trimmed_avocados)
autoplot(model3c)
summary(model3c)
##
## Call:
## lm(formula = average_price ~ type + region + quarter, data = trimmed_avocados)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.06767 -0.15971 -0.01185 0.14629 1.54411
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.213689 0.014517 83.603 < 2e-16 ***
## typeorganic 0.495911 0.003835 129.296 < 2e-16 ***
## regionAtlanta -0.223077 0.019928 -11.194 < 2e-16 ***
## regionBaltimoreWashington -0.026805 0.019928 -1.345 0.178619
## regionBoise -0.212899 0.019928 -10.683 < 2e-16 ***
## regionBoston -0.030148 0.019928 -1.513 0.130339
## regionBuffaloRochester -0.044201 0.019928 -2.218 0.026565 *
## regionCalifornia -0.165710 0.019928 -8.315 < 2e-16 ***
## regionCharlotte 0.045000 0.019928 2.258 0.023950 *
## regionChicago -0.004260 0.019928 -0.214 0.830716
## regionCincinnatiDayton -0.351834 0.019928 -17.655 < 2e-16 ***
## regionColumbus -0.308254 0.019928 -15.468 < 2e-16 ***
## regionDallasFtWorth -0.475444 0.019928 -23.858 < 2e-16 ***
## regionDenver -0.342456 0.019928 -17.185 < 2e-16 ***
## regionDetroit -0.284941 0.019928 -14.298 < 2e-16 ***
## regionGrandRapids -0.056036 0.019928 -2.812 0.004931 **
## regionGreatLakes -0.222485 0.019928 -11.164 < 2e-16 ***
## regionHarrisburgScranton -0.047751 0.019928 -2.396 0.016577 *
## regionHartfordSpringfield 0.257604 0.019928 12.927 < 2e-16 ***
## regionHouston -0.513107 0.019928 -25.748 < 2e-16 ***
## regionIndianapolis -0.247041 0.019928 -12.397 < 2e-16 ***
## regionJacksonville -0.050089 0.019928 -2.513 0.011963 *
## regionLasVegas -0.180118 0.019928 -9.038 < 2e-16 ***
## regionLosAngeles -0.345030 0.019928 -17.314 < 2e-16 ***
## regionLouisville -0.274349 0.019928 -13.767 < 2e-16 ***
## regionMiamiFtLauderdale -0.132544 0.019928 -6.651 2.99e-11 ***
## regionMidsouth -0.156272 0.019928 -7.842 4.69e-15 ***
## regionNashville -0.348935 0.019928 -17.510 < 2e-16 ***
## regionNewOrleansMobile -0.256243 0.019928 -12.858 < 2e-16 ***
## regionNewYork 0.166538 0.019928 8.357 < 2e-16 ***
## regionNortheast 0.040888 0.019928 2.052 0.040208 *
## regionNorthernNewEngland -0.083639 0.019928 -4.197 2.72e-05 ***
## regionOrlando -0.054822 0.019928 -2.751 0.005947 **
## regionPhiladelphia 0.071095 0.019928 3.568 0.000361 ***
## regionPhoenixTucson -0.336598 0.019928 -16.891 < 2e-16 ***
## regionPittsburgh -0.196716 0.019928 -9.871 < 2e-16 ***
## regionPlains -0.124527 0.019928 -6.249 4.23e-10 ***
## regionPortland -0.243314 0.019928 -12.210 < 2e-16 ***
## regionRaleighGreensboro -0.005917 0.019928 -0.297 0.766527
## regionRichmondNorfolk -0.269704 0.019928 -13.534 < 2e-16 ***
## regionRoanoke -0.313107 0.019928 -15.712 < 2e-16 ***
## regionSacramento 0.060533 0.019928 3.038 0.002389 **
## regionSanDiego -0.162870 0.019928 -8.173 3.21e-16 ***
## regionSanFrancisco 0.243166 0.019928 12.202 < 2e-16 ***
## regionSeattle -0.118462 0.019928 -5.944 2.82e-09 ***
## regionSouthCarolina -0.157751 0.019928 -7.916 2.59e-15 ***
## regionSouthCentral -0.459793 0.019928 -23.073 < 2e-16 ***
## regionSoutheast -0.163018 0.019928 -8.180 3.02e-16 ***
## regionSpokane -0.115444 0.019928 -5.793 7.03e-09 ***
## regionStLouis -0.130414 0.019928 -6.544 6.14e-11 ***
## regionSyracuse -0.040710 0.019928 -2.043 0.041082 *
## regionTampa -0.152189 0.019928 -7.637 2.33e-14 ***
## regionTotalUS -0.242012 0.019928 -12.144 < 2e-16 ***
## regionWest -0.288817 0.019928 -14.493 < 2e-16 ***
## regionWestTexNewMexico -0.297141 0.019973 -14.877 < 2e-16 ***
## quarter2 0.068479 0.005303 12.912 < 2e-16 ***
## quarter3 0.206308 0.005303 38.906 < 2e-16 ***
## quarter4 0.152007 0.005265 28.869 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2591 on 18191 degrees of freedom
## Multiple R-squared: 0.5874, Adjusted R-squared: 0.5861
## F-statistic: 454.3 on 57 and 18191 DF, p-value: < 2.2e-16
So model3c with type, region and quarter wins out here. Everything still looks reasonable with the diagnostics, perhaps some mild heteroscedasticity.
Remember with two predictors, our R^2 variable was up at 0.5473. Now, with three predictors, we are at 0.5874. Ok, that seems reasonable as an improvement. So let’s see how much improvement we get by adding a fourth variable. Again, check the residuals to see which ones we should try add.
avocados_remaining_resid <- trimmed_avocados %>%
add_residuals(model3c) %>%
select(-c("average_price", "type", "region", "quarter"))
ggpairs(avocados_remaining_resid) +
theme_grey(base_size = 8) # font size of labels
The contender variables here are x_large_bags and year, so let’s try them out.
model4a <- lm(average_price ~ type + region + quarter + x_large_bags, data = trimmed_avocados)
autoplot(model4a)
summary(model4a)
##
## Call:
## lm(formula = average_price ~ type + region + quarter + x_large_bags,
## data = trimmed_avocados)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.06889 -0.16013 -0.01154 0.14553 1.54291
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.212e+00 1.451e-02 83.493 < 2e-16 ***
## typeorganic 4.998e-01 3.916e-03 127.614 < 2e-16 ***
## regionAtlanta -2.235e-01 1.992e-02 -11.222 < 2e-16 ***
## regionBaltimoreWashington -2.711e-02 1.992e-02 -1.361 0.173535
## regionBoise -2.128e-01 1.992e-02 -10.687 < 2e-16 ***
## regionBoston -3.022e-02 1.992e-02 -1.518 0.129137
## regionBuffaloRochester -4.427e-02 1.992e-02 -2.223 0.026233 *
## regionCalifornia -1.753e-01 2.002e-02 -8.759 < 2e-16 ***
## regionCharlotte 4.495e-02 1.992e-02 2.257 0.024015 *
## regionChicago -4.877e-03 1.992e-02 -0.245 0.806549
## regionCincinnatiDayton -3.522e-01 1.992e-02 -17.686 < 2e-16 ***
## regionColumbus -3.086e-01 1.992e-02 -15.494 < 2e-16 ***
## regionDallasFtWorth -4.762e-01 1.992e-02 -23.908 < 2e-16 ***
## regionDenver -3.425e-01 1.992e-02 -17.196 < 2e-16 ***
## regionDetroit -2.879e-01 1.993e-02 -14.449 < 2e-16 ***
## regionGrandRapids -5.750e-02 1.992e-02 -2.887 0.003898 **
## regionGreatLakes -2.342e-01 2.006e-02 -11.671 < 2e-16 ***
## regionHarrisburgScranton -4.796e-02 1.992e-02 -2.408 0.016054 *
## regionHartfordSpringfield 2.575e-01 1.992e-02 12.931 < 2e-16 ***
## regionHouston -5.136e-01 1.992e-02 -25.789 < 2e-16 ***
## regionIndianapolis -2.475e-01 1.992e-02 -12.426 < 2e-16 ***
## regionJacksonville -5.020e-02 1.992e-02 -2.521 0.011720 *
## regionLasVegas -1.801e-01 1.992e-02 -9.041 < 2e-16 ***
## regionLosAngeles -3.524e-01 1.998e-02 -17.644 < 2e-16 ***
## regionLouisville -2.745e-01 1.992e-02 -13.781 < 2e-16 ***
## regionMiamiFtLauderdale -1.330e-01 1.992e-02 -6.679 2.47e-11 ***
## regionMidsouth -1.587e-01 1.992e-02 -7.967 1.72e-15 ***
## regionNashville -3.491e-01 1.992e-02 -17.527 < 2e-16 ***
## regionNewOrleansMobile -2.571e-01 1.992e-02 -12.909 < 2e-16 ***
## regionNewYork 1.660e-01 1.992e-02 8.333 < 2e-16 ***
## regionNortheast 3.856e-02 1.992e-02 1.936 0.052939 .
## regionNorthernNewEngland -8.376e-02 1.992e-02 -4.206 2.61e-05 ***
## regionOrlando -5.519e-02 1.992e-02 -2.771 0.005592 **
## regionPhiladelphia 7.098e-02 1.992e-02 3.564 0.000366 ***
## regionPhoenixTucson -3.368e-01 1.992e-02 -16.911 < 2e-16 ***
## regionPittsburgh -1.967e-01 1.992e-02 -9.879 < 2e-16 ***
## regionPlains -1.265e-01 1.992e-02 -6.350 2.20e-10 ***
## regionPortland -2.434e-01 1.992e-02 -12.220 < 2e-16 ***
## regionRaleighGreensboro -6.012e-03 1.992e-02 -0.302 0.762753
## regionRichmondNorfolk -2.699e-01 1.992e-02 -13.549 < 2e-16 ***
## regionRoanoke -3.132e-01 1.992e-02 -15.725 < 2e-16 ***
## regionSacramento 6.023e-02 1.992e-02 3.024 0.002497 **
## regionSanDiego -1.631e-01 1.992e-02 -8.187 2.85e-16 ***
## regionSanFrancisco 2.429e-01 1.992e-02 12.194 < 2e-16 ***
## regionSeattle -1.185e-01 1.992e-02 -5.950 2.72e-09 ***
## regionSouthCarolina -1.581e-01 1.992e-02 -7.938 2.18e-15 ***
## regionSouthCentral -4.646e-01 1.994e-02 -23.297 < 2e-16 ***
## regionSoutheast -1.676e-01 1.994e-02 -8.404 < 2e-16 ***
## regionSpokane -1.154e-01 1.992e-02 -5.793 7.02e-09 ***
## regionStLouis -1.307e-01 1.992e-02 -6.565 5.35e-11 ***
## regionSyracuse -4.071e-02 1.992e-02 -2.044 0.040974 *
## regionTampa -1.525e-01 1.992e-02 -7.659 1.96e-14 ***
## regionTotalUS -2.814e-01 2.153e-02 -13.068 < 2e-16 ***
## regionWest -2.903e-01 1.992e-02 -14.573 < 2e-16 ***
## regionWestTexNewMexico -2.976e-01 1.996e-02 -14.910 < 2e-16 ***
## quarter2 6.806e-02 5.301e-03 12.839 < 2e-16 ***
## quarter3 2.055e-01 5.302e-03 38.761 < 2e-16 ***
## quarter4 1.527e-01 5.264e-03 29.001 < 2e-16 ***
## x_large_bags 6.215e-07 1.292e-07 4.810 1.52e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2589 on 18190 degrees of freedom
## Multiple R-squared: 0.5879, Adjusted R-squared: 0.5866
## F-statistic: 447.4 on 58 and 18190 DF, p-value: < 2.2e-16
model4b <- lm(average_price ~ type + region + quarter + year, data = trimmed_avocados)
autoplot(model4b)
summary(model4b)
##
## Call:
## lm(formula = average_price ~ type + region + quarter + year,
## data = trimmed_avocados)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.03683 -0.14588 -0.00412 0.14386 1.43930
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.167184 0.014290 81.677 < 2e-16 ***
## typeorganic 0.495930 0.003675 134.950 < 2e-16 ***
## regionAtlanta -0.223077 0.019094 -11.683 < 2e-16 ***
## regionBaltimoreWashington -0.026805 0.019094 -1.404 0.160383
## regionBoise -0.212899 0.019094 -11.150 < 2e-16 ***
## regionBoston -0.030148 0.019094 -1.579 0.114368
## regionBuffaloRochester -0.044201 0.019094 -2.315 0.020627 *
## regionCalifornia -0.165710 0.019094 -8.679 < 2e-16 ***
## regionCharlotte 0.045000 0.019094 2.357 0.018445 *
## regionChicago -0.004260 0.019094 -0.223 0.823439
## regionCincinnatiDayton -0.351834 0.019094 -18.427 < 2e-16 ***
## regionColumbus -0.308254 0.019094 -16.144 < 2e-16 ***
## regionDallasFtWorth -0.475444 0.019094 -24.900 < 2e-16 ***
## regionDenver -0.342456 0.019094 -17.935 < 2e-16 ***
## regionDetroit -0.284941 0.019094 -14.923 < 2e-16 ***
## regionGrandRapids -0.056036 0.019094 -2.935 0.003342 **
## regionGreatLakes -0.222485 0.019094 -11.652 < 2e-16 ***
## regionHarrisburgScranton -0.047751 0.019094 -2.501 0.012397 *
## regionHartfordSpringfield 0.257604 0.019094 13.491 < 2e-16 ***
## regionHouston -0.513107 0.019094 -26.873 < 2e-16 ***
## regionIndianapolis -0.247041 0.019094 -12.938 < 2e-16 ***
## regionJacksonville -0.050089 0.019094 -2.623 0.008716 **
## regionLasVegas -0.180118 0.019094 -9.433 < 2e-16 ***
## regionLosAngeles -0.345030 0.019094 -18.070 < 2e-16 ***
## regionLouisville -0.274349 0.019094 -14.368 < 2e-16 ***
## regionMiamiFtLauderdale -0.132544 0.019094 -6.942 4.00e-12 ***
## regionMidsouth -0.156272 0.019094 -8.184 2.91e-16 ***
## regionNashville -0.348935 0.019094 -18.275 < 2e-16 ***
## regionNewOrleansMobile -0.256243 0.019094 -13.420 < 2e-16 ***
## regionNewYork 0.166538 0.019094 8.722 < 2e-16 ***
## regionNortheast 0.040888 0.019094 2.141 0.032255 *
## regionNorthernNewEngland -0.083639 0.019094 -4.380 1.19e-05 ***
## regionOrlando -0.054822 0.019094 -2.871 0.004094 **
## regionPhiladelphia 0.071095 0.019094 3.723 0.000197 ***
## regionPhoenixTucson -0.336598 0.019094 -17.629 < 2e-16 ***
## regionPittsburgh -0.196716 0.019094 -10.303 < 2e-16 ***
## regionPlains -0.124527 0.019094 -6.522 7.13e-11 ***
## regionPortland -0.243314 0.019094 -12.743 < 2e-16 ***
## regionRaleighGreensboro -0.005917 0.019094 -0.310 0.756641
## regionRichmondNorfolk -0.269704 0.019094 -14.125 < 2e-16 ***
## regionRoanoke -0.313107 0.019094 -16.398 < 2e-16 ***
## regionSacramento 0.060533 0.019094 3.170 0.001526 **
## regionSanDiego -0.162870 0.019094 -8.530 < 2e-16 ***
## regionSanFrancisco 0.243166 0.019094 12.735 < 2e-16 ***
## regionSeattle -0.118462 0.019094 -6.204 5.62e-10 ***
## regionSouthCarolina -0.157751 0.019094 -8.262 < 2e-16 ***
## regionSouthCentral -0.459793 0.019094 -24.081 < 2e-16 ***
## regionSoutheast -0.163018 0.019094 -8.538 < 2e-16 ***
## regionSpokane -0.115444 0.019094 -6.046 1.51e-09 ***
## regionStLouis -0.130414 0.019094 -6.830 8.75e-12 ***
## regionSyracuse -0.040710 0.019094 -2.132 0.033011 *
## regionTampa -0.152189 0.019094 -7.971 1.67e-15 ***
## regionTotalUS -0.242012 0.019094 -12.675 < 2e-16 ***
## regionWest -0.288817 0.019094 -15.126 < 2e-16 ***
## regionWestTexNewMexico -0.296624 0.019137 -15.500 < 2e-16 ***
## quarter2 0.081121 0.005410 14.996 < 2e-16 ***
## quarter3 0.218901 0.005409 40.471 < 2e-16 ***
## quarter4 0.161972 0.005376 30.130 < 2e-16 ***
## year2016 -0.036978 0.004684 -7.894 3.10e-15 ***
## year2017 0.138658 0.004663 29.735 < 2e-16 ***
## year2018 0.087412 0.008334 10.488 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2482 on 18188 degrees of freedom
## Multiple R-squared: 0.6213, Adjusted R-squared: 0.62
## F-statistic: 497.3 on 60 and 18188 DF, p-value: < 2.2e-16
Hmm, model4b with type, region, quarter and year wins here. And it has improved our model performance from 0.5874 (with three predictors) to 0.6213. That’s quite good.
We are likely now pursuing variables with rather limited explanatory power, but let’s check for one more main effect, and see how much predictive power it gives us.
avocados_remaining_resid <- trimmed_avocados %>%
add_residuals(model4b) %>%
select(-c("average_price", "type", "region", "quarter", "year"))
ggpairs(avocados_remaining_resid) +
theme_grey(base_size = 8) # font size of labels
It looks like x_large_bags is the remaining contender, let’s check it out!
model5 <- lm(average_price ~ type + region + quarter + year + x_large_bags, data = trimmed_avocados)
autoplot(model5)
summary(model5)
##
## Call:
## lm(formula = average_price ~ type + region + quarter + year +
## x_large_bags, data = trimmed_avocados)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.03610 -0.14545 -0.00439 0.14420 1.43907
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.167e+00 1.429e-02 81.687 < 2e-16 ***
## typeorganic 4.982e-01 3.755e-03 132.674 < 2e-16 ***
## regionAtlanta -2.233e-01 1.909e-02 -11.698 < 2e-16 ***
## regionBaltimoreWashington -2.698e-02 1.909e-02 -1.413 0.157614
## regionBoise -2.129e-01 1.909e-02 -11.151 < 2e-16 ***
## regionBoston -3.019e-02 1.909e-02 -1.582 0.113769
## regionBuffaloRochester -4.424e-02 1.909e-02 -2.318 0.020485 *
## regionCalifornia -1.713e-01 1.919e-02 -8.925 < 2e-16 ***
## regionCharlotte 4.497e-02 1.909e-02 2.356 0.018493 *
## regionChicago -4.616e-03 1.909e-02 -0.242 0.808941
## regionCincinnatiDayton -3.521e-01 1.909e-02 -18.442 < 2e-16 ***
## regionColumbus -3.084e-01 1.909e-02 -16.157 < 2e-16 ***
## regionDallasFtWorth -4.759e-01 1.909e-02 -24.926 < 2e-16 ***
## regionDenver -3.425e-01 1.909e-02 -17.940 < 2e-16 ***
## regionDetroit -2.866e-01 1.910e-02 -15.008 < 2e-16 ***
## regionGrandRapids -5.688e-02 1.909e-02 -2.979 0.002894 **
## regionGreatLakes -2.292e-01 1.923e-02 -11.918 < 2e-16 ***
## regionHarrisburgScranton -4.787e-02 1.909e-02 -2.508 0.012166 *
## regionHartfordSpringfield 2.576e-01 1.909e-02 13.492 < 2e-16 ***
## regionHouston -5.134e-01 1.909e-02 -26.894 < 2e-16 ***
## regionIndianapolis -2.473e-01 1.909e-02 -12.954 < 2e-16 ***
## regionJacksonville -5.015e-02 1.909e-02 -2.627 0.008615 **
## regionLasVegas -1.801e-01 1.909e-02 -9.434 < 2e-16 ***
## regionLosAngeles -3.493e-01 1.915e-02 -18.243 < 2e-16 ***
## regionLouisville -2.744e-01 1.909e-02 -14.375 < 2e-16 ***
## regionMiamiFtLauderdale -1.328e-01 1.909e-02 -6.958 3.58e-12 ***
## regionMidsouth -1.577e-01 1.910e-02 -8.257 < 2e-16 ***
## regionNashville -3.490e-01 1.909e-02 -18.282 < 2e-16 ***
## regionNewOrleansMobile -2.567e-01 1.909e-02 -13.448 < 2e-16 ***
## regionNewYork 1.662e-01 1.909e-02 8.706 < 2e-16 ***
## regionNortheast 3.955e-02 1.910e-02 2.071 0.038381 *
## regionNorthernNewEngland -8.371e-02 1.909e-02 -4.385 1.17e-05 ***
## regionOrlando -5.503e-02 1.909e-02 -2.883 0.003945 **
## regionPhiladelphia 7.103e-02 1.909e-02 3.721 0.000199 ***
## regionPhoenixTucson -3.367e-01 1.909e-02 -17.638 < 2e-16 ***
## regionPittsburgh -1.967e-01 1.909e-02 -10.305 < 2e-16 ***
## regionPlains -1.257e-01 1.909e-02 -6.581 4.80e-11 ***
## regionPortland -2.434e-01 1.909e-02 -12.748 < 2e-16 ***
## regionRaleighGreensboro -5.972e-03 1.909e-02 -0.313 0.754415
## regionRichmondNorfolk -2.698e-01 1.909e-02 -14.132 < 2e-16 ***
## regionRoanoke -3.131e-01 1.909e-02 -16.404 < 2e-16 ***
## regionSacramento 6.036e-02 1.909e-02 3.162 0.001571 **
## regionSanDiego -1.630e-01 1.909e-02 -8.537 < 2e-16 ***
## regionSanFrancisco 2.430e-01 1.909e-02 12.728 < 2e-16 ***
## regionSeattle -1.185e-01 1.909e-02 -6.207 5.52e-10 ***
## regionSouthCarolina -1.579e-01 1.909e-02 -8.274 < 2e-16 ***
## regionSouthCentral -4.625e-01 1.911e-02 -24.199 < 2e-16 ***
## regionSoutheast -1.656e-01 1.911e-02 -8.667 < 2e-16 ***
## regionSpokane -1.154e-01 1.909e-02 -6.045 1.52e-09 ***
## regionStLouis -1.306e-01 1.909e-02 -6.842 8.08e-12 ***
## regionSyracuse -4.071e-02 1.909e-02 -2.132 0.032984 *
## regionTampa -1.524e-01 1.909e-02 -7.983 1.52e-15 ***
## regionTotalUS -2.647e-01 2.066e-02 -12.815 < 2e-16 ***
## regionWest -2.897e-01 1.909e-02 -15.171 < 2e-16 ***
## regionWestTexNewMexico -2.969e-01 1.913e-02 -15.518 < 2e-16 ***
## quarter2 8.058e-02 5.412e-03 14.891 < 2e-16 ***
## quarter3 2.181e-01 5.414e-03 40.293 < 2e-16 ***
## quarter4 1.621e-01 5.375e-03 30.154 < 2e-16 ***
## year2016 -3.791e-02 4.695e-03 -8.075 7.16e-16 ***
## year2017 1.375e-01 4.680e-03 29.381 < 2e-16 ***
## year2018 8.547e-02 8.360e-03 10.223 < 2e-16 ***
## x_large_bags 3.583e-07 1.246e-07 2.877 0.004025 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2482 on 18187 degrees of freedom
## Multiple R-squared: 0.6214, Adjusted R-squared: 0.6202
## F-statistic: 489.4 on 61 and 18187 DF, p-value: < 2.2e-16
Overall, we still have some heterscedasticity and deviations from normality in the residuals. In terms of our regression summary, it is a significant explanatory variable, and it is significant. But hmmm… with four predictors, our overall R^2 was 0.6213, and now with five we’ve only reached 0.6214. Given that there is no real increase in explanatory performance, even though it’s significant, we might want to remove it. Let’s do this now.
It’s also clear we aren’t gaining anything by adding predictors. The final thing we can do is test for interactions.
Let’s now think about possible pair interactions: for four main effect variables (type + region + quarter + year), so we have six possible pair interactions. Let’s test them out.
Let’s test these now:
model5pa <- lm(average_price ~ type + region + quarter + year + type:region, data = trimmed_avocados)
summary(model5pa)
##
## Call:
## lm(formula = average_price ~ type + region + quarter + year +
## type:region, data = trimmed_avocados)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.0082 -0.1335 -0.0024 0.1335 1.4799
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.202843 0.018542 64.870 < 2e-16 ***
## typeorganic 0.424556 0.025580 16.597 < 2e-16 ***
## regionAtlanta -0.279941 0.025580 -10.944 < 2e-16 ***
## regionBaltimoreWashington -0.004556 0.025580 -0.178 0.858635
## regionBoise -0.272722 0.025580 -10.661 < 2e-16 ***
## regionBoston -0.044379 0.025580 -1.735 0.082778 .
## regionBuffaloRochester 0.033550 0.025580 1.312 0.189681
## regionCalifornia -0.243314 0.025580 -9.512 < 2e-16 ***
## regionCharlotte -0.073669 0.025580 -2.880 0.003983 **
## regionChicago 0.020592 0.025580 0.805 0.420838
## regionCincinnatiDayton -0.333254 0.025580 -13.028 < 2e-16 ***
## regionColumbus -0.282485 0.025580 -11.043 < 2e-16 ***
## regionDallasFtWorth -0.502308 0.025580 -19.637 < 2e-16 ***
## regionDenver -0.274793 0.025580 -10.742 < 2e-16 ***
## regionDetroit -0.224793 0.025580 -8.788 < 2e-16 ***
## regionGrandRapids -0.023728 0.025580 -0.928 0.353635
## regionGreatLakes -0.166864 0.025580 -6.523 7.07e-11 ***
## regionHarrisburgScranton -0.089941 0.025580 -3.516 0.000439 ***
## regionHartfordSpringfield 0.059290 0.025580 2.318 0.020471 *
## regionHouston -0.523669 0.025580 -20.472 < 2e-16 ***
## regionIndianapolis -0.203905 0.025580 -7.971 1.66e-15 ***
## regionJacksonville -0.155148 0.025580 -6.065 1.34e-09 ***
## regionLasVegas -0.335799 0.025580 -13.127 < 2e-16 ***
## regionLosAngeles -0.372308 0.025580 -14.555 < 2e-16 ***
## regionLouisville -0.243432 0.025580 -9.516 < 2e-16 ***
## regionMiamiFtLauderdale -0.094438 0.025580 -3.692 0.000223 ***
## regionMidsouth -0.141598 0.025580 -5.535 3.15e-08 ***
## regionNashville -0.335858 0.025580 -13.130 < 2e-16 ***
## regionNewOrleansMobile -0.263491 0.025580 -10.301 < 2e-16 ***
## regionNewYork 0.053373 0.025580 2.086 0.036948 *
## regionNortheast -0.004320 0.025580 -0.169 0.865907
## regionNorthernNewEngland -0.088521 0.025580 -3.461 0.000540 ***
## regionOrlando -0.134320 0.025580 -5.251 1.53e-07 ***
## regionPhiladelphia 0.047574 0.025580 1.860 0.062930 .
## regionPhoenixTucson -0.620533 0.025580 -24.258 < 2e-16 ***
## regionPittsburgh -0.098107 0.025580 -3.835 0.000126 ***
## regionPlains -0.183254 0.025580 -7.164 8.14e-13 ***
## regionPortland -0.302249 0.025580 -11.816 < 2e-16 ***
## regionRaleighGreensboro -0.121657 0.025580 -4.756 1.99e-06 ***
## regionRichmondNorfolk -0.228935 0.025580 -8.950 < 2e-16 ***
## regionRoanoke -0.252722 0.025580 -9.880 < 2e-16 ***
## regionSacramento -0.074793 0.025580 -2.924 0.003461 **
## regionSanDiego -0.287278 0.025580 -11.230 < 2e-16 ***
## regionSanFrancisco 0.048402 0.025580 1.892 0.058483 .
## regionSeattle -0.178994 0.025580 -6.997 2.70e-12 ***
## regionSouthCarolina -0.202544 0.025580 -7.918 2.55e-15 ***
## regionSouthCentral -0.479349 0.025580 -18.739 < 2e-16 ***
## regionSoutheast -0.185740 0.025580 -7.261 4.00e-13 ***
## regionSpokane -0.232781 0.025580 -9.100 < 2e-16 ***
## regionStLouis -0.163018 0.025580 -6.373 1.90e-10 ***
## regionSyracuse 0.038166 0.025580 1.492 0.135716
## regionTampa -0.147160 0.025580 -5.753 8.91e-09 ***
## regionTotalUS -0.256746 0.025580 -10.037 < 2e-16 ***
## regionWest -0.363669 0.025580 -14.217 < 2e-16 ***
## regionWestTexNewMexico -0.506627 0.025580 -19.805 < 2e-16 ***
## quarter2 0.081206 0.005125 15.846 < 2e-16 ***
## quarter3 0.218901 0.005124 42.721 < 2e-16 ***
## quarter4 0.162013 0.005092 31.814 < 2e-16 ***
## year2016 -0.037010 0.004438 -8.340 < 2e-16 ***
## year2017 0.138688 0.004417 31.396 < 2e-16 ***
## year2018 0.087411 0.007895 11.071 < 2e-16 ***
## typeorganic:regionAtlanta 0.113728 0.036176 3.144 0.001671 **
## typeorganic:regionBaltimoreWashington -0.044497 0.036176 -1.230 0.218705
## typeorganic:regionBoise 0.119645 0.036176 3.307 0.000944 ***
## typeorganic:regionBoston 0.028462 0.036176 0.787 0.431435
## typeorganic:regionBuffaloRochester -0.155503 0.036176 -4.299 1.73e-05 ***
## typeorganic:regionCalifornia 0.155207 0.036176 4.290 1.79e-05 ***
## typeorganic:regionCharlotte 0.237337 0.036176 6.561 5.50e-11 ***
## typeorganic:regionChicago -0.049704 0.036176 -1.374 0.169471
## typeorganic:regionCincinnatiDayton -0.037160 0.036176 -1.027 0.304341
## typeorganic:regionColumbus -0.051538 0.036176 -1.425 0.154271
## typeorganic:regionDallasFtWorth 0.053728 0.036176 1.485 0.137512
## typeorganic:regionDenver -0.135325 0.036176 -3.741 0.000184 ***
## typeorganic:regionDetroit -0.120296 0.036176 -3.325 0.000885 ***
## typeorganic:regionGrandRapids -0.064615 0.036176 -1.786 0.074092 .
## typeorganic:regionGreatLakes -0.111243 0.036176 -3.075 0.002108 **
## typeorganic:regionHarrisburgScranton 0.084379 0.036176 2.332 0.019687 *
## typeorganic:regionHartfordSpringfield 0.396627 0.036176 10.964 < 2e-16 ***
## typeorganic:regionHouston 0.021124 0.036176 0.584 0.559273
## typeorganic:regionIndianapolis -0.086272 0.036176 -2.385 0.017099 *
## typeorganic:regionJacksonville 0.210118 0.036176 5.808 6.42e-09 ***
## typeorganic:regionLasVegas 0.311361 0.036176 8.607 < 2e-16 ***
## typeorganic:regionLosAngeles 0.054556 0.036176 1.508 0.131550
## typeorganic:regionLouisville -0.061834 0.036176 -1.709 0.087418 .
## typeorganic:regionMiamiFtLauderdale -0.076213 0.036176 -2.107 0.035154 *
## typeorganic:regionMidsouth -0.029349 0.036176 -0.811 0.417210
## typeorganic:regionNashville -0.026154 0.036176 -0.723 0.469711
## typeorganic:regionNewOrleansMobile 0.014497 0.036176 0.401 0.688618
## typeorganic:regionNewYork 0.226331 0.036176 6.256 4.03e-10 ***
## typeorganic:regionNortheast 0.090414 0.036176 2.499 0.012453 *
## typeorganic:regionNorthernNewEngland 0.009763 0.036176 0.270 0.787252
## typeorganic:regionOrlando 0.158994 0.036176 4.395 1.11e-05 ***
## typeorganic:regionPhiladelphia 0.047041 0.036176 1.300 0.193496
## typeorganic:regionPhoenixTucson 0.567870 0.036176 15.697 < 2e-16 ***
## typeorganic:regionPittsburgh -0.197219 0.036176 -5.452 5.05e-08 ***
## typeorganic:regionPlains 0.117456 0.036176 3.247 0.001169 **
## typeorganic:regionPortland 0.117870 0.036176 3.258 0.001123 **
## typeorganic:regionRaleighGreensboro 0.231479 0.036176 6.399 1.61e-10 ***
## typeorganic:regionRichmondNorfolk -0.081538 0.036176 -2.254 0.024211 *
## typeorganic:regionRoanoke -0.120769 0.036176 -3.338 0.000844 ***
## typeorganic:regionSacramento 0.270651 0.036176 7.482 7.68e-14 ***
## typeorganic:regionSanDiego 0.248817 0.036176 6.878 6.27e-12 ***
## typeorganic:regionSanFrancisco 0.389527 0.036176 10.768 < 2e-16 ***
## typeorganic:regionSeattle 0.121065 0.036176 3.347 0.000820 ***
## typeorganic:regionSouthCarolina 0.089586 0.036176 2.476 0.013281 *
## typeorganic:regionSouthCentral 0.039112 0.036176 1.081 0.279633
## typeorganic:regionSoutheast 0.045444 0.036176 1.256 0.209063
## typeorganic:regionSpokane 0.234675 0.036176 6.487 8.98e-11 ***
## typeorganic:regionStLouis 0.065207 0.036176 1.803 0.071483 .
## typeorganic:regionSyracuse -0.157751 0.036176 -4.361 1.30e-05 ***
## typeorganic:regionTampa -0.010059 0.036176 -0.278 0.780967
## typeorganic:regionTotalUS 0.029467 0.036176 0.815 0.415334
## typeorganic:regionWest 0.149704 0.036176 4.138 3.52e-05 ***
## typeorganic:regionWestTexNewMexico 0.423157 0.036257 11.671 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2351 on 18135 degrees of freedom
## Multiple R-squared: 0.6611, Adjusted R-squared: 0.659
## F-statistic: 313.1 on 113 and 18135 DF, p-value: < 2.2e-16
model5pb <- lm(average_price ~ type + region + quarter + year + type:quarter, data = trimmed_avocados)
summary(model5pb)
##
## Call:
## lm(formula = average_price ~ type + region + quarter + year +
## type:quarter, data = trimmed_avocados)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.02358 -0.14643 -0.00311 0.14370 1.44227
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.180432 0.014545 81.158 < 2e-16 ***
## typeorganic 0.469434 0.006682 70.256 < 2e-16 ***
## regionAtlanta -0.223077 0.019073 -11.696 < 2e-16 ***
## regionBaltimoreWashington -0.026805 0.019073 -1.405 0.159924
## regionBoise -0.212899 0.019073 -11.162 < 2e-16 ***
## regionBoston -0.030148 0.019073 -1.581 0.113971
## regionBuffaloRochester -0.044201 0.019073 -2.317 0.020488 *
## regionCalifornia -0.165710 0.019073 -8.688 < 2e-16 ***
## regionCharlotte 0.045000 0.019073 2.359 0.018316 *
## regionChicago -0.004260 0.019073 -0.223 0.823248
## regionCincinnatiDayton -0.351834 0.019073 -18.447 < 2e-16 ***
## regionColumbus -0.308254 0.019073 -16.162 < 2e-16 ***
## regionDallasFtWorth -0.475444 0.019073 -24.928 < 2e-16 ***
## regionDenver -0.342456 0.019073 -17.955 < 2e-16 ***
## regionDetroit -0.284941 0.019073 -14.940 < 2e-16 ***
## regionGrandRapids -0.056036 0.019073 -2.938 0.003308 **
## regionGreatLakes -0.222485 0.019073 -11.665 < 2e-16 ***
## regionHarrisburgScranton -0.047751 0.019073 -2.504 0.012301 *
## regionHartfordSpringfield 0.257604 0.019073 13.506 < 2e-16 ***
## regionHouston -0.513107 0.019073 -26.902 < 2e-16 ***
## regionIndianapolis -0.247041 0.019073 -12.953 < 2e-16 ***
## regionJacksonville -0.050089 0.019073 -2.626 0.008642 **
## regionLasVegas -0.180118 0.019073 -9.444 < 2e-16 ***
## regionLosAngeles -0.345030 0.019073 -18.090 < 2e-16 ***
## regionLouisville -0.274349 0.019073 -14.384 < 2e-16 ***
## regionMiamiFtLauderdale -0.132544 0.019073 -6.949 3.79e-12 ***
## regionMidsouth -0.156272 0.019073 -8.193 2.71e-16 ***
## regionNashville -0.348935 0.019073 -18.295 < 2e-16 ***
## regionNewOrleansMobile -0.256243 0.019073 -13.435 < 2e-16 ***
## regionNewYork 0.166538 0.019073 8.732 < 2e-16 ***
## regionNortheast 0.040888 0.019073 2.144 0.032066 *
## regionNorthernNewEngland -0.083639 0.019073 -4.385 1.17e-05 ***
## regionOrlando -0.054822 0.019073 -2.874 0.004053 **
## regionPhiladelphia 0.071095 0.019073 3.728 0.000194 ***
## regionPhoenixTucson -0.336598 0.019073 -17.648 < 2e-16 ***
## regionPittsburgh -0.196716 0.019073 -10.314 < 2e-16 ***
## regionPlains -0.124527 0.019073 -6.529 6.80e-11 ***
## regionPortland -0.243314 0.019073 -12.757 < 2e-16 ***
## regionRaleighGreensboro -0.005917 0.019073 -0.310 0.756382
## regionRichmondNorfolk -0.269704 0.019073 -14.141 < 2e-16 ***
## regionRoanoke -0.313107 0.019073 -16.416 < 2e-16 ***
## regionSacramento 0.060533 0.019073 3.174 0.001507 **
## regionSanDiego -0.162870 0.019073 -8.539 < 2e-16 ***
## regionSanFrancisco 0.243166 0.019073 12.749 < 2e-16 ***
## regionSeattle -0.118462 0.019073 -6.211 5.38e-10 ***
## regionSouthCarolina -0.157751 0.019073 -8.271 < 2e-16 ***
## regionSouthCentral -0.459793 0.019073 -24.107 < 2e-16 ***
## regionSoutheast -0.163018 0.019073 -8.547 < 2e-16 ***
## regionSpokane -0.115444 0.019073 -6.053 1.45e-09 ***
## regionStLouis -0.130414 0.019073 -6.838 8.30e-12 ***
## regionSyracuse -0.040710 0.019073 -2.134 0.032819 *
## regionTampa -0.152189 0.019073 -7.979 1.56e-15 ***
## regionTotalUS -0.242012 0.019073 -12.689 < 2e-16 ***
## regionWest -0.288817 0.019073 -15.143 < 2e-16 ***
## regionWestTexNewMexico -0.296626 0.019116 -15.518 < 2e-16 ***
## quarter2 0.066217 0.007413 8.933 < 2e-16 ***
## quarter3 0.186137 0.007413 25.110 < 2e-16 ***
## quarter4 0.152474 0.007364 20.706 < 2e-16 ***
## year2016 -0.036977 0.004679 -7.902 2.89e-15 ***
## year2017 0.138659 0.004658 29.768 < 2e-16 ***
## year2018 0.087412 0.008325 10.500 < 2e-16 ***
## typeorganic:quarter2 0.029809 0.010152 2.936 0.003325 **
## typeorganic:quarter3 0.065528 0.010150 6.456 1.10e-10 ***
## typeorganic:quarter4 0.018995 0.010079 1.885 0.059501 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2479 on 18185 degrees of freedom
## Multiple R-squared: 0.6222, Adjusted R-squared: 0.6209
## F-statistic: 475.3 on 63 and 18185 DF, p-value: < 2.2e-16
model5pc <- lm(average_price ~ type + region + quarter + year + type:year, data = trimmed_avocados)
summary(model5pc)
##
## Call:
## lm(formula = average_price ~ type + region + quarter + year +
## type:year, data = trimmed_avocados)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.00911 -0.14461 -0.00436 0.13900 1.46703
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.117496 0.014421 77.493 < 2e-16 ***
## typeorganic 0.595327 0.006565 90.688 < 2e-16 ***
## regionAtlanta -0.223077 0.018919 -11.791 < 2e-16 ***
## regionBaltimoreWashington -0.026805 0.018919 -1.417 0.156565
## regionBoise -0.212899 0.018919 -11.253 < 2e-16 ***
## regionBoston -0.030148 0.018919 -1.593 0.111069
## regionBuffaloRochester -0.044201 0.018919 -2.336 0.019488 *
## regionCalifornia -0.165710 0.018919 -8.759 < 2e-16 ***
## regionCharlotte 0.045000 0.018919 2.379 0.017393 *
## regionChicago -0.004260 0.018919 -0.225 0.821839
## regionCincinnatiDayton -0.351834 0.018919 -18.596 < 2e-16 ***
## regionColumbus -0.308254 0.018919 -16.293 < 2e-16 ***
## regionDallasFtWorth -0.475444 0.018919 -25.130 < 2e-16 ***
## regionDenver -0.342456 0.018919 -18.101 < 2e-16 ***
## regionDetroit -0.284941 0.018919 -15.061 < 2e-16 ***
## regionGrandRapids -0.056036 0.018919 -2.962 0.003063 **
## regionGreatLakes -0.222485 0.018919 -11.760 < 2e-16 ***
## regionHarrisburgScranton -0.047751 0.018919 -2.524 0.011613 *
## regionHartfordSpringfield 0.257604 0.018919 13.616 < 2e-16 ***
## regionHouston -0.513107 0.018919 -27.121 < 2e-16 ***
## regionIndianapolis -0.247041 0.018919 -13.058 < 2e-16 ***
## regionJacksonville -0.050089 0.018919 -2.647 0.008117 **
## regionLasVegas -0.180118 0.018919 -9.520 < 2e-16 ***
## regionLosAngeles -0.345030 0.018919 -18.237 < 2e-16 ***
## regionLouisville -0.274349 0.018919 -14.501 < 2e-16 ***
## regionMiamiFtLauderdale -0.132544 0.018919 -7.006 2.54e-12 ***
## regionMidsouth -0.156272 0.018919 -8.260 < 2e-16 ***
## regionNashville -0.348935 0.018919 -18.443 < 2e-16 ***
## regionNewOrleansMobile -0.256243 0.018919 -13.544 < 2e-16 ***
## regionNewYork 0.166538 0.018919 8.802 < 2e-16 ***
## regionNortheast 0.040888 0.018919 2.161 0.030698 *
## regionNorthernNewEngland -0.083639 0.018919 -4.421 9.89e-06 ***
## regionOrlando -0.054822 0.018919 -2.898 0.003764 **
## regionPhiladelphia 0.071095 0.018919 3.758 0.000172 ***
## regionPhoenixTucson -0.336598 0.018919 -17.791 < 2e-16 ***
## regionPittsburgh -0.196716 0.018919 -10.398 < 2e-16 ***
## regionPlains -0.124527 0.018919 -6.582 4.77e-11 ***
## regionPortland -0.243314 0.018919 -12.860 < 2e-16 ***
## regionRaleighGreensboro -0.005917 0.018919 -0.313 0.754471
## regionRichmondNorfolk -0.269704 0.018919 -14.255 < 2e-16 ***
## regionRoanoke -0.313107 0.018919 -16.549 < 2e-16 ***
## regionSacramento 0.060533 0.018919 3.199 0.001379 **
## regionSanDiego -0.162870 0.018919 -8.609 < 2e-16 ***
## regionSanFrancisco 0.243166 0.018919 12.853 < 2e-16 ***
## regionSeattle -0.118462 0.018919 -6.261 3.90e-10 ***
## regionSouthCarolina -0.157751 0.018919 -8.338 < 2e-16 ***
## regionSouthCentral -0.459793 0.018919 -24.303 < 2e-16 ***
## regionSoutheast -0.163018 0.018919 -8.616 < 2e-16 ***
## regionSpokane -0.115444 0.018919 -6.102 1.07e-09 ***
## regionStLouis -0.130414 0.018919 -6.893 5.64e-12 ***
## regionSyracuse -0.040710 0.018919 -2.152 0.031430 *
## regionTampa -0.152189 0.018919 -8.044 9.22e-16 ***
## regionTotalUS -0.242012 0.018919 -12.792 < 2e-16 ***
## regionWest -0.288817 0.018919 -15.266 < 2e-16 ***
## regionWestTexNewMexico -0.296641 0.018962 -15.644 < 2e-16 ***
## quarter2 0.081108 0.005360 15.132 < 2e-16 ***
## quarter3 0.218901 0.005359 40.844 < 2e-16 ***
## quarter4 0.161984 0.005327 30.410 < 2e-16 ***
## year2016 0.027632 0.006564 4.210 2.57e-05 ***
## year2017 0.216048 0.006533 33.069 < 2e-16 ***
## year2018 0.165421 0.011209 14.758 < 2e-16 ***
## typeorganic:year2016 -0.129237 0.009283 -13.921 < 2e-16 ***
## typeorganic:year2017 -0.154818 0.009240 -16.755 < 2e-16 ***
## typeorganic:year2018 -0.156037 0.015159 -10.293 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.246 on 18185 degrees of freedom
## Multiple R-squared: 0.6282, Adjusted R-squared: 0.6269
## F-statistic: 487.7 on 63 and 18185 DF, p-value: < 2.2e-16
model5pd <- lm(average_price ~ type + region + quarter + year + region:quarter, data = trimmed_avocados)
summary(model5pd)
##
## Call:
## lm(formula = average_price ~ type + region + quarter + year +
## region:quarter, data = trimmed_avocados)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.06598 -0.14588 0.00059 0.14115 1.38051
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.216463 0.024241 50.182 < 2e-16 ***
## typeorganic 0.495917 0.003583 138.408 < 2e-16 ***
## regionAtlanta -0.257647 0.033888 -7.603 3.04e-14 ***
## regionBaltimoreWashington -0.089804 0.033888 -2.650 0.008056 **
## regionBoise -0.285392 0.033888 -8.422 < 2e-16 ***
## regionBoston -0.007059 0.033888 -0.208 0.835000
## regionBuffaloRochester -0.031078 0.033888 -0.917 0.359111
## regionCalifornia -0.279706 0.033888 -8.254 < 2e-16 ***
## regionCharlotte -0.021471 0.033888 -0.634 0.526370
## regionChicago -0.073627 0.033888 -2.173 0.029820 *
## regionCincinnatiDayton -0.434902 0.033888 -12.833 < 2e-16 ***
## regionColumbus -0.324804 0.033888 -9.585 < 2e-16 ***
## regionDallasFtWorth -0.484510 0.033888 -14.297 < 2e-16 ***
## regionDenver -0.421569 0.033888 -12.440 < 2e-16 ***
## regionDetroit -0.305000 0.033888 -9.000 < 2e-16 ***
## regionGrandRapids -0.128235 0.033888 -3.784 0.000155 ***
## regionGreatLakes -0.268137 0.033888 -7.912 2.67e-15 ***
## regionHarrisburgScranton -0.060000 0.033888 -1.771 0.076657 .
## regionHartfordSpringfield 0.229020 0.033888 6.758 1.44e-11 ***
## regionHouston -0.537059 0.033888 -15.848 < 2e-16 ***
## regionIndianapolis -0.273824 0.033888 -8.080 6.87e-16 ***
## regionJacksonville -0.110392 0.033888 -3.258 0.001126 **
## regionLasVegas -0.290686 0.033888 -8.578 < 2e-16 ***
## regionLosAngeles -0.433039 0.033888 -12.778 < 2e-16 ***
## regionLouisville -0.295490 0.033888 -8.720 < 2e-16 ***
## regionMiamiFtLauderdale -0.111863 0.033888 -3.301 0.000966 ***
## regionMidsouth -0.194510 0.033888 -5.740 9.64e-09 ***
## regionNashville -0.351275 0.033888 -10.366 < 2e-16 ***
## regionNewOrleansMobile -0.317255 0.033888 -9.362 < 2e-16 ***
## regionNewYork 0.105098 0.033888 3.101 0.001930 **
## regionNortheast 0.020000 0.033888 0.590 0.555082
## regionNorthernNewEngland -0.059804 0.033888 -1.765 0.077625 .
## regionOrlando -0.103431 0.033888 -3.052 0.002276 **
## regionPhiladelphia 0.016569 0.033888 0.489 0.624905
## regionPhoenixTucson -0.445294 0.033888 -13.140 < 2e-16 ***
## regionPittsburgh -0.174510 0.033888 -5.150 2.64e-07 ***
## regionPlains -0.184412 0.033888 -5.442 5.34e-08 ***
## regionPortland -0.353235 0.033888 -10.424 < 2e-16 ***
## regionRaleighGreensboro -0.058039 0.033888 -1.713 0.086792 .
## regionRichmondNorfolk -0.263627 0.033888 -7.779 7.68e-15 ***
## regionRoanoke -0.312255 0.033888 -9.214 < 2e-16 ***
## regionSacramento -0.027059 0.033888 -0.798 0.424608
## regionSanDiego -0.286667 0.033888 -8.459 < 2e-16 ***
## regionSanFrancisco 0.090588 0.033888 2.673 0.007521 **
## regionSeattle -0.258824 0.033888 -7.638 2.32e-14 ***
## regionSouthCarolina -0.206961 0.033888 -6.107 1.04e-09 ***
## regionSouthCentral -0.475686 0.033888 -14.037 < 2e-16 ***
## regionSoutheast -0.207255 0.033888 -6.116 9.80e-10 ***
## regionSpokane -0.269608 0.033888 -7.956 1.88e-15 ***
## regionStLouis -0.190980 0.033888 -5.636 1.77e-08 ***
## regionSyracuse -0.027647 0.033888 -0.816 0.414609
## regionTampa -0.153235 0.033888 -4.522 6.17e-06 ***
## regionTotalUS -0.290392 0.033888 -8.569 < 2e-16 ***
## regionWest -0.389020 0.033888 -11.479 < 2e-16 ***
## regionWestTexNewMexico -0.365980 0.033888 -10.800 < 2e-16 ***
## quarter2 0.085685 0.036447 2.351 0.018736 *
## quarter3 0.093249 0.036447 2.558 0.010521 *
## quarter4 0.071967 0.036188 1.989 0.046752 *
## year2016 -0.036996 0.004567 -8.100 5.83e-16 ***
## year2017 0.138600 0.004546 30.485 < 2e-16 ***
## year2018 0.087387 0.008126 10.754 < 2e-16 ***
## regionAtlanta:quarter2 -0.088379 0.051480 -1.717 0.086041 .
## regionBaltimoreWashington:quarter2 0.092368 0.051480 1.794 0.072790 .
## regionBoise:quarter2 -0.095505 0.051480 -1.855 0.063585 .
## regionBoston:quarter2 0.011418 0.051480 0.222 0.824479
## regionBuffaloRochester:quarter2 0.081719 0.051480 1.587 0.112440
## regionCalifornia:quarter2 0.003552 0.051480 0.069 0.944992
## regionCharlotte:quarter2 0.062240 0.051480 1.209 0.226676
## regionChicago:quarter2 -0.004193 0.051480 -0.081 0.935085
## regionCincinnatiDayton:quarter2 0.010030 0.051480 0.195 0.845524
## regionColumbus:quarter2 -0.094042 0.051480 -1.827 0.067751 .
## regionDallasFtWorth:quarter2 -0.078439 0.051480 -1.524 0.127607
## regionDenver:quarter2 -0.015739 0.051480 -0.306 0.759813
## regionDetroit:quarter2 -0.036923 0.051480 -0.717 0.473241
## regionGrandRapids:quarter2 0.135799 0.051480 2.638 0.008349 **
## regionGreatLakes:quarter2 -0.011478 0.051480 -0.223 0.823567
## regionHarrisburgScranton:quarter2 0.065513 0.051480 1.273 0.203181
## regionHartfordSpringfield:quarter2 0.067262 0.051480 1.307 0.191375
## regionHouston:quarter2 -0.089223 0.051480 -1.733 0.083084 .
## regionIndianapolis:quarter2 -0.064253 0.051480 -1.248 0.212003
## regionJacksonville:quarter2 0.028213 0.051480 0.548 0.583677
## regionLasVegas:quarter2 -0.074314 0.051480 -1.444 0.148885
## regionLosAngeles:quarter2 -0.060679 0.051480 -1.179 0.238540
## regionLouisville:quarter2 -0.074510 0.051480 -1.447 0.147816
## regionMiamiFtLauderdale:quarter2 -0.009676 0.051480 -0.188 0.850917
## regionMidsouth:quarter2 -0.013952 0.051480 -0.271 0.786385
## regionNashville:quarter2 -0.102572 0.051480 -1.992 0.046336 *
## regionNewOrleansMobile:quarter2 0.083793 0.051480 1.628 0.103609
## regionNewYork:quarter2 0.087722 0.051480 1.704 0.088397 .
## regionNortheast:quarter2 0.056410 0.051480 1.096 0.273195
## regionNorthernNewEngland:quarter2 -0.067632 0.051480 -1.314 0.188947
## regionOrlando:quarter2 0.018047 0.051480 0.351 0.725924
## regionPhiladelphia:quarter2 0.109970 0.051480 2.136 0.032680 *
## regionPhoenixTucson:quarter2 -0.020090 0.051480 -0.390 0.696351
## regionPittsburgh:quarter2 -0.038054 0.051480 -0.739 0.459792
## regionPlains:quarter2 -0.002896 0.051480 -0.056 0.955141
## regionPortland:quarter2 -0.045354 0.051480 -0.881 0.378324
## regionRaleighGreensboro:quarter2 0.001885 0.051480 0.037 0.970786
## regionRichmondNorfolk:quarter2 -0.113552 0.051480 -2.206 0.027414 *
## regionRoanoke:quarter2 -0.131207 0.051480 -2.549 0.010821 *
## regionSacramento:quarter2 0.084238 0.051480 1.636 0.101788
## regionSanDiego:quarter2 -0.003333 0.051480 -0.065 0.948374
## regionSanFrancisco:quarter2 0.121976 0.051480 2.369 0.017828 *
## regionSeattle:quarter2 0.012029 0.051480 0.234 0.815254
## regionSouthCarolina:quarter2 0.027602 0.051480 0.536 0.591851
## regionSouthCentral:quarter2 -0.072262 0.051480 -1.404 0.160426
## regionSoutheast:quarter2 -0.005950 0.051480 -0.116 0.907984
## regionSpokane:quarter2 0.009736 0.051480 0.189 0.849999
## regionStLouis:quarter2 0.057006 0.051480 1.107 0.268161
## regionSyracuse:quarter2 0.064955 0.051480 1.262 0.207057
## regionTampa:quarter2 0.006056 0.051480 0.118 0.906359
## regionTotalUS:quarter2 -0.009223 0.051480 -0.179 0.857813
## regionWest:quarter2 -0.029186 0.051480 -0.567 0.570770
## regionWestTexNewMexico:quarter2 -0.096213 0.051672 -1.862 0.062620 .
## regionAtlanta:quarter3 0.122391 0.051480 2.377 0.017444 *
## regionBaltimoreWashington:quarter3 0.095830 0.051480 1.861 0.062691 .
## regionBoise:quarter3 0.251931 0.051480 4.894 9.98e-07 ***
## regionBoston:quarter3 -0.001146 0.051480 -0.022 0.982235
## regionBuffaloRochester:quarter3 -0.034050 0.051480 -0.661 0.508354
## regionCalifornia:quarter3 0.255860 0.051480 4.970 6.75e-07 ***
## regionCharlotte:quarter3 0.139804 0.051480 2.716 0.006620 **
## regionChicago:quarter3 0.174012 0.051480 3.380 0.000726 ***
## regionCincinnatiDayton:quarter3 0.212594 0.051480 4.130 3.65e-05 ***
## regionColumbus:quarter3 0.109291 0.051480 2.123 0.033769 *
## regionDallasFtWorth:quarter3 0.023228 0.051480 0.451 0.651852
## regionDenver:quarter3 0.212466 0.051480 4.127 3.69e-05 ***
## regionDetroit:quarter3 0.054872 0.051480 1.066 0.286490
## regionGrandRapids:quarter3 0.091440 0.051480 1.776 0.075712 .
## regionGreatLakes:quarter3 0.123522 0.051480 2.399 0.016432 *
## regionHarrisburgScranton:quarter3 0.006795 0.051480 0.132 0.894993
## regionHartfordSpringfield:quarter3 0.049442 0.051480 0.960 0.336862
## regionHouston:quarter3 0.072059 0.051480 1.400 0.161608
## regionIndianapolis:quarter3 0.092157 0.051480 1.790 0.073447 .
## regionJacksonville:quarter3 0.168213 0.051480 3.268 0.001087 **
## regionLasVegas:quarter3 0.295302 0.051480 5.736 9.84e-09 ***
## regionLosAngeles:quarter3 0.214578 0.051480 4.168 3.08e-05 ***
## regionLouisville:quarter3 0.084721 0.051480 1.646 0.099842 .
## regionMiamiFtLauderdale:quarter3 -0.072240 0.051480 -1.403 0.160557
## regionMidsouth:quarter3 0.095407 0.051480 1.853 0.063858 .
## regionNashville:quarter3 0.041531 0.051480 0.807 0.419828
## regionNewOrleansMobile:quarter3 0.071357 0.051480 1.386 0.165728
## regionNewYork:quarter3 0.112338 0.051480 2.182 0.029110 *
## regionNortheast:quarter3 0.050256 0.051480 0.976 0.328963
## regionNorthernNewEngland:quarter3 -0.013658 0.051480 -0.265 0.790782
## regionOrlando:quarter3 0.116252 0.051480 2.258 0.023946 *
## regionPhiladelphia:quarter3 0.082149 0.051480 1.596 0.110562
## regionPhoenixTucson:quarter3 0.260038 0.051480 5.051 4.43e-07 ***
## regionPittsburgh:quarter3 -0.016131 0.051480 -0.313 0.754019
## regionPlains:quarter3 0.136335 0.051480 2.648 0.008097 **
## regionPortland:quarter3 0.334261 0.051480 6.493 8.63e-11 ***
## regionRaleighGreensboro:quarter3 0.121373 0.051480 2.358 0.018401 *
## regionRichmondNorfolk:quarter3 0.051576 0.051480 1.002 0.316421
## regionRoanoke:quarter3 0.090460 0.051480 1.757 0.078903 .
## regionSacramento:quarter3 0.181161 0.051480 3.519 0.000434 ***
## regionSanDiego:quarter3 0.280385 0.051480 5.446 5.21e-08 ***
## regionSanFrancisco:quarter3 0.312360 0.051480 6.068 1.32e-09 ***
## regionSeattle:quarter3 0.392029 0.051480 7.615 2.76e-14 ***
## regionSouthCarolina:quarter3 0.102345 0.051480 1.988 0.046820 *
## regionSouthCentral:quarter3 0.042609 0.051480 0.828 0.407859
## regionSoutheast:quarter3 0.111357 0.051480 2.163 0.030545 *
## regionSpokane:quarter3 0.393582 0.051480 7.645 2.19e-14 ***
## regionStLouis:quarter3 0.192134 0.051480 3.732 0.000190 ***
## regionSyracuse:quarter3 -0.036840 0.051480 -0.716 0.474236
## regionTampa:quarter3 -0.043047 0.051480 -0.836 0.403063
## regionTotalUS:quarter3 0.104751 0.051480 2.035 0.041887 *
## regionWest:quarter3 0.297609 0.051480 5.781 7.55e-09 ***
## regionWestTexNewMexico:quarter3 0.178160 0.051480 3.461 0.000540 ***
## regionAtlanta:quarter4 0.112897 0.051114 2.209 0.027206 *
## regionBaltimoreWashington:quarter4 0.082679 0.051114 1.618 0.105780
## regionBoise:quarter4 0.153767 0.051114 3.008 0.002631 **
## regionBoston:quarter4 -0.107566 0.051114 -2.104 0.035355 *
## regionBuffaloRochester:quarter4 -0.101922 0.051114 -1.994 0.046167 *
## regionCalifornia:quarter4 0.228706 0.051114 4.474 7.71e-06 ***
## regionCharlotte:quarter4 0.083846 0.051114 1.640 0.100948
## regionChicago:quarter4 0.127502 0.051114 2.494 0.012624 *
## regionCincinnatiDayton:quarter4 0.133902 0.051114 2.620 0.008809 **
## regionColumbus:quarter4 0.055054 0.051114 1.077 0.281460
## regionDallasFtWorth:quarter4 0.092135 0.051114 1.803 0.071479 .
## regionDenver:quarter4 0.142444 0.051114 2.787 0.005329 **
## regionDetroit:quarter4 0.067250 0.051114 1.316 0.188297
## regionGrandRapids:quarter4 0.083485 0.051114 1.633 0.102421
## regionGreatLakes:quarter4 0.083637 0.051114 1.636 0.101797
## regionHarrisburgScranton:quarter4 -0.018750 0.051114 -0.367 0.713753
## regionHartfordSpringfield:quarter4 0.006980 0.051114 0.137 0.891376
## regionHouston:quarter4 0.117934 0.051114 2.307 0.021051 *
## regionIndianapolis:quarter4 0.085949 0.051114 1.682 0.092683 .
## regionJacksonville:quarter4 0.063267 0.051114 1.238 0.215820
## regionLasVegas:quarter4 0.251686 0.051114 4.924 8.55e-07 ***
## regionLosAngeles:quarter4 0.221789 0.051114 4.339 1.44e-05 ***
## regionLouisville:quarter4 0.079365 0.051114 1.553 0.120511
## regionMiamiFtLauderdale:quarter4 -0.007512 0.051114 -0.147 0.883157
## regionMidsouth:quarter4 0.082135 0.051114 1.607 0.108096
## regionNashville:quarter4 0.069400 0.051114 1.358 0.174564
## regionNewOrleansMobile:quarter4 0.106505 0.051114 2.084 0.037204 *
## regionNewYork:quarter4 0.064527 0.051114 1.262 0.206818
## regionNortheast:quarter4 -0.015750 0.051114 -0.308 0.757984
## regionNorthernNewEngland:quarter4 -0.021446 0.051114 -0.420 0.674803
## regionOrlando:quarter4 0.074431 0.051114 1.456 0.145360
## regionPhiladelphia:quarter4 0.043056 0.051114 0.842 0.399599
## regionPhoenixTucson:quarter4 0.225294 0.051114 4.408 1.05e-05 ***
## regionPittsburgh:quarter4 -0.040990 0.051114 -0.802 0.422601
## regionPlains:quarter4 0.122912 0.051114 2.405 0.016198 *
## regionPortland:quarter4 0.182735 0.051114 3.575 0.000351 ***
## regionRaleighGreensboro:quarter4 0.100039 0.051114 1.957 0.050342 .
## regionRichmondNorfolk:quarter4 0.034752 0.051114 0.680 0.496577
## regionRoanoke:quarter4 0.036130 0.051114 0.707 0.479670
## regionSacramento:quarter4 0.111309 0.051114 2.178 0.029445 *
## regionSanDiego:quarter4 0.252917 0.051114 4.948 7.56e-07 ***
## regionSanFrancisco:quarter4 0.221162 0.051114 4.327 1.52e-05 ***
## regionSeattle:quarter4 0.199074 0.051114 3.895 9.87e-05 ***
## regionSouthCarolina:quarter4 0.081211 0.051114 1.589 0.112120
## regionSouthCentral:quarter4 0.096061 0.051114 1.879 0.060213 .
## regionSoutheast:quarter4 0.084130 0.051114 1.646 0.099797 .
## regionSpokane:quarter4 0.258108 0.051114 5.050 4.47e-07 ***
## regionStLouis:quarter4 0.012980 0.051114 0.254 0.799538
## regionSyracuse:quarter4 -0.082603 0.051114 -1.616 0.106101
## regionTampa:quarter4 0.040485 0.051114 0.792 0.428338
## regionTotalUS:quarter4 0.111267 0.051114 2.177 0.029506 *
## regionWest:quarter4 0.161645 0.051114 3.162 0.001567 **
## regionWestTexNewMexico:quarter4 0.211605 0.051205 4.133 3.60e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.242 on 18029 degrees of freedom
## Multiple R-squared: 0.6431, Adjusted R-squared: 0.6388
## F-statistic: 148.4 on 219 and 18029 DF, p-value: < 2.2e-16
model5pe <- lm(average_price ~ type + region + quarter + year + region:year, data = trimmed_avocados)
summary(model5pe)
##
## Call:
## lm(formula = average_price ~ type + region + quarter + year +
## region:year, data = trimmed_avocados)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.03093 -0.14190 -0.00143 0.13797 1.38892
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.175e+00 2.396e-02 49.047 < 2e-16 ***
## typeorganic 4.959e-01 3.575e-03 138.719 < 2e-16 ***
## regionAtlanta -1.582e-01 3.349e-02 -4.724 2.33e-06 ***
## regionBaltimoreWashington -1.699e-01 3.349e-02 -5.074 3.94e-07 ***
## regionBoise -1.650e-01 3.349e-02 -4.927 8.40e-07 ***
## regionBoston -6.519e-02 3.349e-02 -1.947 0.051566 .
## regionBuffaloRochester 5.865e-03 3.349e-02 0.175 0.860955
## regionCalifornia -2.229e-01 3.349e-02 -6.656 2.89e-11 ***
## regionCharlotte 3.702e-02 3.349e-02 1.106 0.268948
## regionChicago -1.347e-01 3.349e-02 -4.023 5.77e-05 ***
## regionCincinnatiDayton -3.364e-01 3.349e-02 -10.047 < 2e-16 ***
## regionColumbus -2.649e-01 3.349e-02 -7.911 2.70e-15 ***
## regionDallasFtWorth -4.609e-01 3.349e-02 -13.763 < 2e-16 ***
## regionDenver -3.510e-01 3.349e-02 -10.481 < 2e-16 ***
## regionDetroit -2.005e-01 3.349e-02 -5.987 2.18e-09 ***
## regionGrandRapids -1.224e-01 3.349e-02 -3.655 0.000258 ***
## regionGreatLakes -2.125e-01 3.349e-02 -6.346 2.26e-10 ***
## regionHarrisburgScranton -6.712e-02 3.349e-02 -2.004 0.045053 *
## regionHartfordSpringfield 2.090e-01 3.349e-02 6.243 4.40e-10 ***
## regionHouston -4.907e-01 3.349e-02 -14.653 < 2e-16 ***
## regionIndianapolis -1.958e-01 3.349e-02 -5.846 5.11e-09 ***
## regionJacksonville -3.567e-02 3.349e-02 -1.065 0.286744
## regionLasVegas -1.699e-01 3.349e-02 -5.074 3.94e-07 ***
## regionLosAngeles -3.862e-01 3.349e-02 -11.535 < 2e-16 ***
## regionLouisville -2.443e-01 3.349e-02 -7.296 3.08e-13 ***
## regionMiamiFtLauderdale -1.552e-01 3.349e-02 -4.635 3.60e-06 ***
## regionMidsouth -1.874e-01 3.349e-02 -5.597 2.22e-08 ***
## regionNashville -2.615e-01 3.349e-02 -7.810 6.01e-15 ***
## regionNewOrleansMobile -2.711e-01 3.349e-02 -8.095 6.10e-16 ***
## regionNewYork 1.058e-01 3.349e-02 3.159 0.001588 **
## regionNortheast 5.000e-03 3.349e-02 0.149 0.881305
## regionNorthernNewEngland -6.538e-02 3.349e-02 -1.953 0.050881 .
## regionOrlando -3.942e-02 3.349e-02 -1.177 0.239087
## regionPhiladelphia 1.644e-02 3.349e-02 0.491 0.623415
## regionPhoenixTucson -3.816e-01 3.349e-02 -11.397 < 2e-16 ***
## regionPittsburgh -1.315e-01 3.349e-02 -3.928 8.59e-05 ***
## regionPlains -1.009e-01 3.349e-02 -3.012 0.002597 **
## regionPortland -2.319e-01 3.349e-02 -6.926 4.47e-12 ***
## regionRaleighGreensboro -8.933e-02 3.349e-02 -2.668 0.007646 **
## regionRichmondNorfolk -2.642e-01 3.349e-02 -7.891 3.17e-15 ***
## regionRoanoke -3.116e-01 3.349e-02 -9.306 < 2e-16 ***
## regionSacramento -8.471e-02 3.349e-02 -2.530 0.011422 *
## regionSanDiego -2.645e-01 3.349e-02 -7.899 2.96e-15 ***
## regionSanFrancisco 8.231e-02 3.349e-02 2.458 0.013981 *
## regionSeattle -1.165e-01 3.349e-02 -3.480 0.000502 ***
## regionSouthCarolina -8.404e-02 3.349e-02 -2.510 0.012093 *
## regionSouthCentral -4.267e-01 3.349e-02 -12.744 < 2e-16 ***
## regionSoutheast -1.240e-01 3.349e-02 -3.704 0.000213 ***
## regionSpokane -1.384e-01 3.349e-02 -4.132 3.61e-05 ***
## regionStLouis -3.538e-02 3.349e-02 -1.057 0.290659
## regionSyracuse -9.712e-03 3.349e-02 -0.290 0.771804
## regionTampa -1.821e-01 3.349e-02 -5.439 5.44e-08 ***
## regionTotalUS -2.813e-01 3.349e-02 -8.402 < 2e-16 ***
## regionWest -3.010e-01 3.349e-02 -8.988 < 2e-16 ***
## regionWestTexNewMexico -2.766e-01 3.357e-02 -8.239 < 2e-16 ***
## quarter2 8.108e-02 5.262e-03 15.407 < 2e-16 ***
## quarter3 2.189e-01 5.262e-03 41.602 < 2e-16 ***
## quarter4 1.620e-01 5.229e-03 30.974 < 2e-16 ***
## year2016 -4.808e-03 3.349e-02 -0.144 0.885838
## year2017 9.820e-02 3.333e-02 2.947 0.003217 **
## year2018 1.257e-02 5.478e-02 0.230 0.818454
## regionAtlanta:year2016 -1.616e-01 4.736e-02 -3.413 0.000643 ***
## regionBaltimoreWashington:year2016 2.236e-01 4.736e-02 4.721 2.37e-06 ***
## regionBoise:year2016 -2.270e-01 4.736e-02 -4.794 1.65e-06 ***
## regionBoston:year2016 -4.260e-02 4.736e-02 -0.899 0.368404
## regionBuffaloRochester:year2016 -5.596e-02 4.736e-02 -1.182 0.237332
## regionCalifornia:year2016 1.885e-02 4.736e-02 0.398 0.690658
## regionCharlotte:year2016 -7.308e-02 4.736e-02 -1.543 0.122814
## regionChicago:year2016 1.481e-01 4.736e-02 3.127 0.001769 **
## regionCincinnatiDayton:year2016 -1.091e-01 4.736e-02 -2.305 0.021203 *
## regionColumbus:year2016 -8.269e-02 4.736e-02 -1.746 0.080796 .
## regionDallasFtWorth:year2016 -7.692e-02 4.736e-02 -1.624 0.104317
## regionDenver:year2016 -8.981e-02 4.736e-02 -1.896 0.057918 .
## regionDetroit:year2016 -1.611e-01 4.736e-02 -3.401 0.000673 ***
## regionGrandRapids:year2016 9.779e-02 4.736e-02 2.065 0.038940 *
## regionGreatLakes:year2016 -4.442e-02 4.736e-02 -0.938 0.348222
## regionHarrisburgScranton:year2016 4.481e-02 4.736e-02 0.946 0.344065
## regionHartfordSpringfield:year2016 1.081e-01 4.736e-02 2.282 0.022488 *
## regionHouston:year2016 -5.135e-02 4.736e-02 -1.084 0.278264
## regionIndianapolis:year2016 -3.663e-02 4.736e-02 -0.774 0.439177
## regionJacksonville:year2016 -1.306e-01 4.736e-02 -2.757 0.005833 **
## regionLasVegas:year2016 -1.163e-02 4.736e-02 -0.246 0.805929
## regionLosAngeles:year2016 -6.394e-02 4.736e-02 -1.350 0.176953
## regionLouisville:year2016 -7.808e-02 4.736e-02 -1.649 0.099221 .
## regionMiamiFtLauderdale:year2016 -9.894e-02 4.736e-02 -2.089 0.036692 *
## regionMidsouth:year2016 4.327e-03 4.736e-02 0.091 0.927199
## regionNashville:year2016 -1.562e-01 4.736e-02 -3.299 0.000971 ***
## regionNewOrleansMobile:year2016 -1.423e-02 4.736e-02 -0.301 0.763794
## regionNewYork:year2016 1.223e-01 4.736e-02 2.583 0.009810 **
## regionNortheast:year2016 5.673e-02 4.736e-02 1.198 0.230946
## regionNorthernNewEngland:year2016 -7.587e-02 4.736e-02 -1.602 0.109168
## regionOrlando:year2016 -1.237e-01 4.736e-02 -2.613 0.008978 **
## regionPhiladelphia:year2016 1.244e-01 4.736e-02 2.627 0.008611 **
## regionPhoenixTucson:year2016 1.064e-01 4.736e-02 2.248 0.024607 *
## regionPittsburgh:year2016 -5.904e-02 4.736e-02 -1.247 0.212525
## regionPlains:year2016 -5.558e-02 4.736e-02 -1.174 0.240571
## regionPortland:year2016 -1.104e-01 4.736e-02 -2.331 0.019767 *
## regionRaleighGreensboro:year2016 3.173e-03 4.736e-02 0.067 0.946579
## regionRichmondNorfolk:year2016 -5.856e-02 4.736e-02 -1.237 0.216273
## regionRoanoke:year2016 -7.481e-02 4.736e-02 -1.580 0.114195
## regionSacramento:year2016 2.189e-01 4.736e-02 4.623 3.80e-06 ***
## regionSanDiego:year2016 4.433e-02 4.736e-02 0.936 0.349267
## regionSanFrancisco:year2016 2.650e-01 4.736e-02 5.596 2.23e-08 ***
## regionSeattle:year2016 -1.171e-01 4.736e-02 -2.473 0.013404 *
## regionSouthCarolina:year2016 -1.449e-01 4.736e-02 -3.060 0.002217 **
## regionSouthCentral:year2016 -8.029e-02 4.736e-02 -1.695 0.090012 .
## regionSoutheast:year2016 -1.230e-01 4.736e-02 -2.597 0.009413 **
## regionSpokane:year2016 -6.202e-02 4.736e-02 -1.310 0.190334
## regionStLouis:year2016 -3.131e-01 4.736e-02 -6.611 3.92e-11 ***
## regionSyracuse:year2016 -2.077e-02 4.736e-02 -0.439 0.660973
## regionTampa:year2016 -8.731e-02 4.736e-02 -1.844 0.065251 .
## regionTotalUS:year2016 1.096e-02 4.736e-02 0.231 0.816951
## regionWest:year2016 -5.212e-02 4.736e-02 -1.101 0.271127
## regionWestTexNewMexico:year2016 -1.074e-02 4.741e-02 -0.226 0.820854
## regionAtlanta:year2017 -5.088e-02 4.713e-02 -1.080 0.280337
## regionBaltimoreWashington:year2017 2.115e-01 4.713e-02 4.488 7.25e-06 ***
## regionBoise:year2017 1.981e-02 4.713e-02 0.420 0.674245
## regionBoston:year2017 1.069e-01 4.713e-02 2.268 0.023347 *
## regionBuffaloRochester:year2017 -5.596e-02 4.713e-02 -1.187 0.235126
## regionCalifornia:year2017 1.189e-01 4.713e-02 2.523 0.011639 *
## regionCharlotte:year2017 9.496e-02 4.713e-02 2.015 0.043940 *
## regionChicago:year2017 2.117e-01 4.713e-02 4.491 7.12e-06 ***
## regionCincinnatiDayton:year2017 1.805e-02 4.713e-02 0.383 0.701811
## regionColumbus:year2017 -5.727e-02 4.713e-02 -1.215 0.224378
## regionDallasFtWorth:year2017 1.633e-05 4.713e-02 0.000 0.999724
## regionDenver:year2017 7.087e-02 4.713e-02 1.504 0.132705
## regionDetroit:year2017 -9.829e-02 4.713e-02 -2.085 0.037040 *
## regionGrandRapids:year2017 1.123e-01 4.713e-02 2.383 0.017189 *
## regionGreatLakes:year2017 -7.076e-04 4.713e-02 -0.015 0.988023
## regionHarrisburgScranton:year2017 2.504e-02 4.713e-02 0.531 0.595237
## regionHartfordSpringfield:year2017 4.143e-02 4.713e-02 0.879 0.379365
## regionHouston:year2017 -4.310e-02 4.713e-02 -0.914 0.360486
## regionIndianapolis:year2017 -1.113e-01 4.713e-02 -2.362 0.018208 *
## regionJacksonville:year2017 6.935e-02 4.713e-02 1.471 0.141188
## regionLasVegas:year2017 -5.010e-02 4.713e-02 -1.063 0.287846
## regionLosAngeles:year2017 1.258e-01 4.713e-02 2.669 0.007623 **
## regionLouisville:year2017 -3.643e-02 4.713e-02 -0.773 0.439599
## regionMiamiFtLauderdale:year2017 1.549e-01 4.713e-02 3.287 0.001016 **
## regionMidsouth:year2017 7.014e-02 4.713e-02 1.488 0.136728
## regionNashville:year2017 -1.363e-01 4.713e-02 -2.892 0.003836 **
## regionNewOrleansMobile:year2017 5.228e-02 4.713e-02 1.109 0.267311
## regionNewYork:year2017 6.631e-02 4.713e-02 1.407 0.159498
## regionNortheast:year2017 5.123e-02 4.713e-02 1.087 0.277109
## regionNorthernNewEngland:year2017 4.724e-03 4.713e-02 0.100 0.920160
## regionOrlando:year2017 8.178e-02 4.713e-02 1.735 0.082730 .
## regionPhiladelphia:year2017 5.299e-02 4.713e-02 1.124 0.260891
## regionPhoenixTucson:year2017 1.635e-02 4.713e-02 0.347 0.728647
## regionPittsburgh:year2017 -1.434e-01 4.713e-02 -3.042 0.002355 **
## regionPlains:year2017 -2.649e-02 4.713e-02 -0.562 0.574052
## regionPortland:year2017 2.843e-02 4.713e-02 0.603 0.546348
## regionRaleighGreensboro:year2017 2.202e-01 4.713e-02 4.671 3.01e-06 ***
## regionRichmondNorfolk:year2017 2.565e-02 4.713e-02 0.544 0.586360
## regionRoanoke:year2017 3.211e-02 4.713e-02 0.681 0.495754
## regionSacramento:year2017 2.209e-01 4.713e-02 4.688 2.78e-06 ***
## regionSanDiego:year2017 2.112e-01 4.713e-02 4.481 7.46e-06 ***
## regionSanFrancisco:year2017 2.458e-01 4.713e-02 5.215 1.86e-07 ***
## regionSeattle:year2017 7.805e-02 4.713e-02 1.656 0.097751 .
## regionSouthCarolina:year2017 -7.398e-02 4.713e-02 -1.570 0.116516
## regionSouthCentral:year2017 -4.827e-02 4.713e-02 -1.024 0.305789
## regionSoutheast:year2017 -1.622e-03 4.713e-02 -0.034 0.972549
## regionSpokane:year2017 1.051e-01 4.713e-02 2.229 0.025817 *
## regionStLouis:year2017 -1.065e-02 4.713e-02 -0.226 0.821183
## regionSyracuse:year2017 -3.868e-02 4.713e-02 -0.821 0.411787
## regionTampa:year2017 1.636e-01 4.713e-02 3.472 0.000519 ***
## regionTotalUS:year2017 8.012e-02 4.713e-02 1.700 0.089167 .
## regionWest:year2017 5.313e-02 4.713e-02 1.127 0.259636
## regionWestTexNewMexico:year2017 -7.563e-02 4.730e-02 -1.599 0.109859
## regionAtlanta:year2018 1.109e-02 7.733e-02 0.143 0.885972
## regionBaltimoreWashington:year2018 1.124e-01 7.733e-02 1.454 0.146096
## regionBoise:year2018 2.217e-01 7.733e-02 2.866 0.004156 **
## regionBoston:year2018 2.060e-01 7.733e-02 2.664 0.007725 **
## regionBuffaloRochester:year2018 -2.154e-01 7.733e-02 -2.786 0.005341 **
## regionCalifornia:year2018 1.983e-01 7.733e-02 2.564 0.010347 *
## regionCharlotte:year2018 9.647e-03 7.733e-02 0.125 0.900720
## regionChicago:year2018 2.605e-01 7.733e-02 3.369 0.000756 ***
## regionCincinnatiDayton:year2018 1.764e-01 7.733e-02 2.282 0.022523 *
## regionColumbus:year2018 7.372e-04 7.733e-02 0.010 0.992394
## regionDallasFtWorth:year2018 1.279e-01 7.733e-02 1.655 0.098035 .
## regionDenver:year2018 1.960e-01 7.733e-02 2.534 0.011284 *
## regionDetroit:year2018 -5.744e-02 7.733e-02 -0.743 0.457661
## regionGrandRapids:year2018 1.490e-02 7.733e-02 0.193 0.847176
## regionGreatLakes:year2018 5.500e-02 7.733e-02 0.711 0.476957
## regionHarrisburgScranton:year2018 -3.205e-02 7.733e-02 -0.414 0.678539
## regionHartfordSpringfield:year2018 3.263e-02 7.733e-02 0.422 0.673085
## regionHouston:year2018 9.692e-02 7.733e-02 1.253 0.210099
## regionIndianapolis:year2018 -7.173e-02 7.733e-02 -0.928 0.353643
## regionJacksonville:year2018 5.651e-02 7.733e-02 0.731 0.464972
## regionLasVegas:year2018 1.278e-01 7.733e-02 1.653 0.098372 .
## regionLosAngeles:year2018 3.021e-01 7.733e-02 3.906 9.41e-05 ***
## regionLouisville:year2018 7.641e-02 7.733e-02 0.988 0.323126
## regionMiamiFtLauderdale:year2018 6.353e-02 7.733e-02 0.821 0.411391
## regionMidsouth:year2018 1.099e-01 7.733e-02 1.421 0.155277
## regionNashville:year2018 4.821e-02 7.733e-02 0.623 0.533060
## regionNewOrleansMobile:year2018 3.939e-02 7.733e-02 0.509 0.610495
## regionNewYork:year2018 3.298e-02 7.733e-02 0.426 0.669761
## regionNortheast:year2018 3.333e-02 7.733e-02 0.431 0.666443
## regionNorthernNewEngland:year2018 5.080e-02 7.733e-02 0.657 0.511237
## regionOrlando:year2018 -4.183e-02 7.733e-02 -0.541 0.588600
## regionPhiladelphia:year2018 -3.526e-03 7.733e-02 -0.046 0.963637
## regionPhoenixTucson:year2018 1.008e-01 7.733e-02 1.303 0.192425
## regionPittsburgh:year2018 -2.888e-02 7.733e-02 -0.373 0.708831
## regionPlains:year2018 2.462e-02 7.733e-02 0.318 0.750255
## regionPortland:year2018 1.923e-01 7.733e-02 2.487 0.012884 *
## regionRaleighGreensboro:year2018 1.885e-01 7.733e-02 2.437 0.014800 *
## regionRichmondNorfolk:year2018 6.340e-02 7.733e-02 0.820 0.412336
## regionRoanoke:year2018 1.616e-01 7.733e-02 2.090 0.036619 *
## regionSacramento:year2018 1.210e-01 7.733e-02 1.564 0.117791
## regionSanDiego:year2018 3.066e-01 7.733e-02 3.965 7.38e-05 ***
## regionSanFrancisco:year2018 3.144e-02 7.733e-02 0.407 0.684315
## regionSeattle:year2018 1.357e-01 7.733e-02 1.755 0.079304 .
## regionSouthCarolina:year2018 -8.346e-02 7.733e-02 -1.079 0.280485
## regionSouthCentral:year2018 9.548e-02 7.733e-02 1.235 0.216963
## regionSoutheast:year2018 -8.878e-03 7.733e-02 -0.115 0.908600
## regionSpokane:year2018 1.275e-01 7.733e-02 1.649 0.099134 .
## regionStLouis:year2018 6.538e-02 7.733e-02 0.846 0.397840
## regionSyracuse:year2018 -1.757e-01 7.733e-02 -2.272 0.023093 *
## regionTampa:year2018 7.712e-02 7.733e-02 0.997 0.318681
## regionTotalUS:year2018 1.526e-01 7.733e-02 1.973 0.048481 *
## regionWest:year2018 1.622e-01 7.733e-02 2.098 0.035954 *
## regionWestTexNewMexico:year2018 9.199e-02 7.737e-02 1.189 0.234465
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2415 on 18029 degrees of freedom
## Multiple R-squared: 0.6447, Adjusted R-squared: 0.6404
## F-statistic: 149.4 on 219 and 18029 DF, p-value: < 2.2e-16
model5pf <- lm(average_price ~ type + region + quarter + year + quarter:year, data = trimmed_avocados)
summary(model5pf)
##
## Call:
## lm(formula = average_price ~ type + region + quarter + year +
## quarter:year, data = trimmed_avocados)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.96042 -0.13634 -0.00203 0.13537 1.48398
##
## Coefficients: (3 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.259208 0.014541 86.600 < 2e-16 ***
## typeorganic 0.495932 0.003553 139.577 < 2e-16 ***
## regionAtlanta -0.223077 0.018461 -12.084 < 2e-16 ***
## regionBaltimoreWashington -0.026805 0.018461 -1.452 0.146526
## regionBoise -0.212899 0.018461 -11.532 < 2e-16 ***
## regionBoston -0.030148 0.018461 -1.633 0.102472
## regionBuffaloRochester -0.044201 0.018461 -2.394 0.016662 *
## regionCalifornia -0.165710 0.018461 -8.976 < 2e-16 ***
## regionCharlotte 0.045000 0.018461 2.438 0.014795 *
## regionChicago -0.004260 0.018461 -0.231 0.817490
## regionCincinnatiDayton -0.351834 0.018461 -19.058 < 2e-16 ***
## regionColumbus -0.308254 0.018461 -16.698 < 2e-16 ***
## regionDallasFtWorth -0.475444 0.018461 -25.754 < 2e-16 ***
## regionDenver -0.342456 0.018461 -18.550 < 2e-16 ***
## regionDetroit -0.284941 0.018461 -15.435 < 2e-16 ***
## regionGrandRapids -0.056036 0.018461 -3.035 0.002406 **
## regionGreatLakes -0.222485 0.018461 -12.052 < 2e-16 ***
## regionHarrisburgScranton -0.047751 0.018461 -2.587 0.009700 **
## regionHartfordSpringfield 0.257604 0.018461 13.954 < 2e-16 ***
## regionHouston -0.513107 0.018461 -27.794 < 2e-16 ***
## regionIndianapolis -0.247041 0.018461 -13.382 < 2e-16 ***
## regionJacksonville -0.050089 0.018461 -2.713 0.006669 **
## regionLasVegas -0.180118 0.018461 -9.757 < 2e-16 ***
## regionLosAngeles -0.345030 0.018461 -18.690 < 2e-16 ***
## regionLouisville -0.274349 0.018461 -14.861 < 2e-16 ***
## regionMiamiFtLauderdale -0.132544 0.018461 -7.180 7.25e-13 ***
## regionMidsouth -0.156272 0.018461 -8.465 < 2e-16 ***
## regionNashville -0.348935 0.018461 -18.901 < 2e-16 ***
## regionNewOrleansMobile -0.256243 0.018461 -13.880 < 2e-16 ***
## regionNewYork 0.166538 0.018461 9.021 < 2e-16 ***
## regionNortheast 0.040888 0.018461 2.215 0.026785 *
## regionNorthernNewEngland -0.083639 0.018461 -4.531 5.92e-06 ***
## regionOrlando -0.054822 0.018461 -2.970 0.002985 **
## regionPhiladelphia 0.071095 0.018461 3.851 0.000118 ***
## regionPhoenixTucson -0.336598 0.018461 -18.233 < 2e-16 ***
## regionPittsburgh -0.196716 0.018461 -10.656 < 2e-16 ***
## regionPlains -0.124527 0.018461 -6.745 1.57e-11 ***
## regionPortland -0.243314 0.018461 -13.180 < 2e-16 ***
## regionRaleighGreensboro -0.005917 0.018461 -0.321 0.748575
## regionRichmondNorfolk -0.269704 0.018461 -14.609 < 2e-16 ***
## regionRoanoke -0.313107 0.018461 -16.961 < 2e-16 ***
## regionSacramento 0.060533 0.018461 3.279 0.001044 **
## regionSanDiego -0.162870 0.018461 -8.822 < 2e-16 ***
## regionSanFrancisco 0.243166 0.018461 13.172 < 2e-16 ***
## regionSeattle -0.118462 0.018461 -6.417 1.43e-10 ***
## regionSouthCarolina -0.157751 0.018461 -8.545 < 2e-16 ***
## regionSouthCentral -0.459793 0.018461 -24.906 < 2e-16 ***
## regionSoutheast -0.163018 0.018461 -8.830 < 2e-16 ***
## regionSpokane -0.115444 0.018461 -6.253 4.11e-10 ***
## regionStLouis -0.130414 0.018461 -7.064 1.67e-12 ***
## regionSyracuse -0.040710 0.018461 -2.205 0.027452 *
## regionTampa -0.152189 0.018461 -8.244 < 2e-16 ***
## regionTotalUS -0.242012 0.018461 -13.109 < 2e-16 ***
## regionWest -0.288817 0.018461 -15.645 < 2e-16 ***
## regionWestTexNewMexico -0.296594 0.018502 -16.030 < 2e-16 ***
## quarter2 0.021204 0.009058 2.341 0.019248 *
## quarter3 0.082991 0.009058 9.162 < 2e-16 ***
## quarter4 -0.010357 0.009060 -1.143 0.252944
## year2016 -0.117821 0.009058 -13.007 < 2e-16 ***
## year2017 -0.056574 0.009058 -6.246 4.31e-10 ***
## year2018 -0.004613 0.009245 -0.499 0.617792
## quarter2:year2016 -0.028533 0.012810 -2.227 0.025932 *
## quarter3:year2016 0.095192 0.012810 7.431 1.12e-13 ***
## quarter4:year2016 0.256768 0.012811 20.043 < 2e-16 ***
## quarter2:year2017 0.208350 0.012812 16.262 < 2e-16 ***
## quarter3:year2017 0.312536 0.012810 24.398 < 2e-16 ***
## quarter4:year2017 0.261262 0.012696 20.578 < 2e-16 ***
## quarter2:year2018 NA NA NA NA
## quarter3:year2018 NA NA NA NA
## quarter4:year2018 NA NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.24 on 18182 degrees of freedom
## Multiple R-squared: 0.6461, Adjusted R-squared: 0.6448
## F-statistic: 502.9 on 66 and 18182 DF, p-value: < 2.2e-16
So it looks like model5pa with the type, region, quarter, year, and type:region is the best, with a moderate gain in multiple-\(r^2\) due to the interaction. However, we need to test for the significance of the interaction given the various \(p\)-values of the associated coefficients
anova(model5, model5pa)
Neat, it looks like including the interaction is statistically justified. So we can keep it in. And our final model is:
average_price ~ type + region + quarter + year + type:region
If you wanted to do a predictive (automatic) model, you could follow the same process, using the following code:
library(leaps)
regsubsets_forward <- regsubsets(average_price ~ .,
data = trimmed_avocados,
nvmax = 12,
method = "forward")
plot(regsubsets_forward)
From the plot, it seems like the best performing model has type, year, region (although not all of them are included), and quarter, although again, not all of them are included here.
We can then plot the BIC score:
# See what's in model
plot(summary(regsubsets_forward)$bic, type = "b")
From this, it seems like the BIC score doesn’t really get that much lower after including 8 different variables. We can check which variables these are:
summary(regsubsets_forward)$which[8, ]
## (Intercept) total_volume x4046
## TRUE FALSE FALSE
## small_bags large_bags x_large_bags
## FALSE FALSE FALSE
## typeorganic year2016 year2017
## TRUE FALSE TRUE
## year2018 regionAtlanta regionBaltimoreWashington
## FALSE FALSE FALSE
## regionBoise regionBoston regionBuffaloRochester
## FALSE FALSE FALSE
## regionCalifornia regionCharlotte regionChicago
## FALSE FALSE FALSE
## regionCincinnatiDayton regionColumbus regionDallasFtWorth
## FALSE FALSE FALSE
## regionDenver regionDetroit regionGrandRapids
## FALSE FALSE FALSE
## regionGreatLakes regionHarrisburgScranton regionHartfordSpringfield
## FALSE FALSE TRUE
## regionHouston regionIndianapolis regionJacksonville
## TRUE FALSE FALSE
## regionLasVegas regionLosAngeles regionLouisville
## FALSE FALSE FALSE
## regionMiamiFtLauderdale regionMidsouth regionNashville
## FALSE FALSE FALSE
## regionNewOrleansMobile regionNewYork regionNortheast
## FALSE TRUE FALSE
## regionNorthernNewEngland regionOrlando regionPhiladelphia
## FALSE FALSE FALSE
## regionPhoenixTucson regionPittsburgh regionPlains
## FALSE FALSE FALSE
## regionPortland regionRaleighGreensboro regionRichmondNorfolk
## FALSE FALSE FALSE
## regionRoanoke regionSacramento regionSanDiego
## FALSE FALSE FALSE
## regionSanFrancisco regionSeattle regionSouthCarolina
## TRUE FALSE FALSE
## regionSouthCentral regionSoutheast regionSpokane
## FALSE FALSE FALSE
## regionStLouis regionSyracuse regionTampa
## FALSE FALSE FALSE
## regionTotalUS regionWest regionWestTexNewMexico
## FALSE FALSE FALSE
## quarter2 quarter3 quarter4
## FALSE TRUE TRUE
Given the ones that are true, best model includes type, year, some regions and some quarters. We can include type and year in our model, and then test whether quarter and region can be added.
# test if we should put regions in
mod_type_year <- lm(average_price ~ type + year, data = trimmed_avocados)
mod_type_region <- lm(average_price ~ type + year + region, data = trimmed_avocados)
anova(mod_type_year, mod_type_region)
# yep, it's significant so we can put that in.
# test if we should put year in
mod_type_year <- lm(average_price ~ type + year, data = trimmed_avocados)
mod_type_quarter <- lm(average_price ~ type + year + quarter, data = trimmed_avocados)
anova(mod_type_year, mod_type_quarter)
# yep, it's significant so we can put that in.
# now let's test if the one with region and quarter is different than the one with just region
mod_type_region_quarter <- lm(average_price ~ type + year + region + quarter, data = trimmed_avocados)
anova(mod_type_region_quarter, mod_type_region)
# Yep, that's significant to I would leave it in.
You can continue to test your interactions in the same way as we did during the manual version above.
We didn’t use this in class, but if you were interested in how to do model builing with glmulti(), you can run the code below.
Automated approach : glmulti()
library(glmulti)
This data is pretty big for glmulti on a single CPU core, so we’ll likely not be able to do a search simultaneously for both main effects and pairwise interactions. Let’s look first for the best main effects model using BIC as our metric:
# we're putting set.seed() in here for reproducibility, but you shouldn't include
# this in production code
set.seed(42)
n_data <- nrow(trimmed_avocados)
test_index <- sample(1:n_data, size = n_data * 0.2)
test <- slice(trimmed_avocados, test_index)
train <- slice(trimmed_avocados, -test_index)
# sanity check
nrow(test) + nrow(train) == n_data
nrow(test)
nrow(train)
glmulti_fit <- glmulti(
average_price ~ .,
data = train,
level = 1, # 2 = include pairwise interactions, 1 = main effects only (main effect = no pairwise interactions)
minsize = 1, # no min size of model
maxsize = -1, # -1 = no max size of model
marginality = TRUE, # marginality here means the same as 'strongly hierarchical' interactions, i.e. include pairwise interactions only if both predictors present in the model as main effects.
method = "h", # try exhaustive search, or could use "g" for genetic algorithm instead
crit = bic, # criteria for model selection is BIC value (lower is better)
plotty = FALSE, # don't plot models as function runs
report = TRUE, # do produce reports as function runs
confsetsize = 10, # return best 10 solutions
fitfunction = lm # fit using the `lm` function
)
summary(glmulti_fit)
So the lowest BIC model with main effects is average_price ~ type + year + quarter + total_volume + x_large_bags + region. Let’s have a look at possible extensions to this. We’re going to deliberately try to go to the point where models start to overfit (as tested by the RMSE on the test set), so we’ve seen what this looks like.
results <- tibble(
name = c(), bic = c(), rmse_train = c(), rmse_test = c()
)
# lowest BIC model with main effects
lowest_bic_model <- lm(average_price ~ type + year + quarter + total_volume + x_large_bags + region, data = train)
results <- results %>%
add_row(
tibble_row(
name = "lowest bic",
bic = bic(lowest_bic_model),
rmse_train = rmse(lowest_bic_model, train),
rmse_test = rmse(lowest_bic_model, test)
)
)
# try adding in all possible pairs with these main effects
lowest_bic_model_all_pairs <- lm(average_price ~ (type + year + quarter + total_volume + x_large_bags + region)^2, data = train)
results <- results %>%
add_row(
tibble_row(
name = "lowest bic all pairs",
bic = bic(lowest_bic_model_all_pairs),
rmse_train = rmse(lowest_bic_model_all_pairs, train),
rmse_test = rmse(lowest_bic_model_all_pairs, test)
)
)
# try a model with all main effects
model_all_mains <- lm(average_price ~ ., data = train)
results <- results %>%
add_row(
tibble_row(
name = "all mains",
bic = bic(model_all_mains),
rmse_train = rmse(model_all_mains, train),
rmse_test = rmse(model_all_mains, test)
)
)
# try a model with all main effects and all pairs
model_all_pairs <- lm(average_price ~ .^2, data = train)
results <- results %>%
add_row(
tibble_row(
name = "all pairs",
bic = bic(model_all_pairs),
rmse_train = rmse(model_all_pairs, train),
rmse_test = rmse(model_all_pairs, test)
)
)
# try a model with all main effects, all pairs and one triple (this is getting silly)
model_all_pairs_one_triple <- lm(average_price ~ .^2 + region:type:year, data = train)
results <- results %>%
add_row(
tibble_row(
name = "all pairs one triple",
bic = bic(model_all_pairs_one_triple),
rmse_train = rmse(model_all_pairs_one_triple, train),
rmse_test = rmse(model_all_pairs_one_triple, test)
)
)
# try a model with all main effects, all pairs and multiple triples (more silly)
model_all_pairs_multi_triples <- lm(average_price ~ .^2 + region:type:year + region:type:quarter + region:year:quarter, data = train)
results <- results %>%
add_row(
tibble_row(
name = "all pairs multi triples",
bic = bic(model_all_pairs_multi_triples),
rmse_train = rmse(model_all_pairs_multi_triples, train),
rmse_test = rmse(model_all_pairs_multi_triples, test)
)
)
results <- results %>%
pivot_longer(cols = bic:rmse_test, names_to = "measure", values_to = "value") %>%
mutate(
name = fct_relevel(
as_factor(name),
"lowest bic", "all mains", "lowest bic all pairs", "all pairs", "all pairs one triple", "all pairs multi triples"
)
)
results %>%
filter(measure == "bic") %>%
ggplot(aes(x = name, y = value)) +
geom_col(fill = "steelblue", alpha = 0.7) +
labs(
x = "model",
y = "bic"
) +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
geom_hline(aes(yintercept = 0))
BIC is telling us here that if we took our main effects model with lowest BIC, and added in all possible pairs, this would likely still improve the model for predictive purposes. BIC suggests that this ‘lowest BIC all pairs’ model will offer best predictive performance without overfitting, with all other models being significantly poorer.
Let’s compare the RMSE values of the various models for train and test sets. We expect train RMSE always to go down as model complexity increases, but what happens to the test RMSE as models get more complex?
results %>%
filter(measure != "bic") %>%
ggplot(aes(x = name, y = value, fill = measure)) +
geom_col(position = "dodge", alpha = 0.7) +
labs(
x = "model",
y = "rmse"
) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
Lowest RMSE in test is obtained for the ‘lowest bic all pairs’ model, and it increases thereafter for the more complex models, which suggests that these models are overfitting the training data.